Computing over encrypted spatial data generated by IoT

Proliferation of IoT devices produces the enormous amount of data that need to be stored on clouds. A main focus of this paper is to ensure the secrec...

11 downloads 208 Views 4MB Size

Download PDF

Telecommunication Systems https://doi.org/10.1007/s11235-018-0479-4

Computing over encrypted spatial data generated by IoT Suresh V. Limkar1 · Rakesh Kumar Jha1

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Abstract Proliferation of IoT devices produces the enormous amount of data that need to be stored on clouds. A main focus of this paper is to ensure the secrecy of data, while it is in transit through unsecure communication media from edge devices to cloud and to provide security to the data once it is stored on public cloud. A new techniques based on computing over encrypted data i.e. homomorphic encryption seems to be promising methods. However, most of the previous works supporting computing over encrypted data are neither efficient nor compatible with IoT data and real time spatial data streams because there is a huge difference between normal data and spatial data. In this paper, we proposed a framework for computing over IoT generated encrypted spatial data. In order to provide computation over encrypted data first it needs to be indexed in standard data structure. For indexing encrypted data, we used R tree and its variants. We also proposed a method of most efficient and scalable, parallel construction of R trees and its variants on real time encrypted spatial data. We fired spatial range queries on encrypted spatial data. Specifically, the spatial range query execution time over encrypted spatial data of our proposed scheme is extremely efficient which takes slightly more time as taken by normal spatial range query executed over non-encrypted real time spatial data. Our scheme is not only efficient, but also highly compatible and scalable with IoT generated spatial data. Moreover, we rigorously define the scalability, query performance time, analyze the security of our schemes, and also conduct extensive experiments with a real time spatial dataset to demonstrate the performance of our schemes. Keywords Data security · Cloud computing · IoT · Zetta · ADS-B · R tree · MapReduce · Apache Spark · Homomorphic encryption

1 Introduction Internet of things (IoT) [1,2] is a fast-growing collection of internet-connected sensors embedded in a wide-ranging variety of physical objects i.e. things. Whereas things can literally be any physical object (animate or inanimate) on the planet to which you could connect or embed a sensor. Sensors can take a large number of possible measurements. Internet connectivity to the things can be either wired or wireless. These things are producing enormous amounts of new, structured, unstructured, real-time big data. The IoT will never go to sleep; billions of internet connected things will become an information collector and also huge information producer.

B

Rakesh Kumar Jha [email protected] Suresh V. Limkar [email protected]

1

School of Electronics and Communication Engineering, Shri Mata Vaishno Devi University, Katra, J&K, India

Gartner, Inc. [3] predicts that 8.4 billion devices will be in use across the world till the end of 2017, i.e. 31% more than in the year 2016, and will reach 20.4 billion by the end of the year 2020. Whereas it generate 400 ZBs of data up to year 2018, up from 113.4ZBs in 2013. A zettabyte is a trillion gigabytes. IDC [4] forecasted that by 2020, machine generated data will be 1/10th of the total generated world’s data. Day by day there is enormous improvement in the sensor and communication technology, resulting into generating new sources of geospatial big data [5,6]. The geospatial data are found in the various domains such as military, politics, medical, marketing, resource, environment and economy, etc., which is highly confidential and sensitive, if such information is illegally taken, forged or spoiled leads to countless loss. Geospatial big data [7] is type of spatial data which surpasses the capacity of current computing systems. Geospatial data contributes a major part of today’s big data and the volume of such data is increasing rapidly by 20% in every year

123

S. V. Limkar, R. K. Jha

Fig. 1 Gartner’s hype cycle [9]

[8]. From Garner’s hype cycle in Fig. 1 [9], it is clear that geospatial big data is at peak stage of expectation which is highlighted in red colored rectangle. It also shows in memory database management system is a necessity, which is highlighted in blue colored rectangle. For example, FitBit application which is used for common health monitoring. The FitBit gathers various types of data such as quality of sleep, step-counts, heart rate and location. This type of data can be used to find personal private information such as whether user’s are suffering from chronic diseases, user’s social interactions, user’s mental health conditions. Inferring such personal information might be of great value, from the user’s perspective. However, the user not at all interested in sharing such information with others i.e. corporate service provider. User always try to avoid unauthorized sharing of data, along with disclosure of sensitive information. For securing spatial data, proper encryption mechanism must be applied. Very few researchers paid attention towards encrypting spatial data. Necessity of cloud storage is unavoidable for the dramatically growing IoT data, not only for storage purpose, but also to make this data available and easily accessible to other devices and services. As it is not easy nor even lucrative for number of companies and organizations to maintain a huge

123

amount of data on their local servers. Thus, it is very common to see companies and organizations [10], outsourcing their data including spatial big data to public cloud providers such as Amazon [11] and Google [12] cloud. However, the violation of privacy and security incidents in the cloud ware observed, outsourcing data to the cloud leads to compromising the security and privacy [13] and also it is very easy for an inside attacker (e.g. cloud administrator) to compromise the security. For example, compromising the spatial datasets outsourced by Foursquare through the transgression of Amazon Web Services (AWS) would put in risk millions of users’ private location data. So researchers are continuously working hard to design a security mechanism where we can delegate the ability to process our data to untrusted party, without giving away access to it i.e. in simple way if we wish, not to compromise the privacy and security of dataset, then apply traditional encryption techniques [14] before outsourcing dataset to cloud, but certainly it introduces difficulties in terms of search functionalities over encrypted data. However, the disadvantage of this approach is twofold: (i) in order to process the geospatial big data computing power of the cloud is not considered; and (ii) while analyzing the geospatial big data, user requires high bandwidth and high computing power on the user-side, and it results in unde-

Computing over encrypted spatial data generated by IoT Table 1 Spatial data versus non-spatial data Spatial data

Non-spatial data

Data that describe location. Which are expressed with the graphic primitives i.e. points, polygons or pixel’s

Data that is independent of all geometric consideration

For e.g. It includes location, shape, size and orientation

For e.g. person’s height, mass, weight and age are non-spatial data

These types of data are multidimensional and auto correlated

These types of data are single dimensional and independent

Spatial data are difficult to create

Non-spatial data are easy to create

A spatial database is often used as a storage container for geospatial data

Non-spatial database is often used as a storage container for non-spatial data

Spatial database has the ability to store and access both Location/Spatial Information and Attributes/Non-spatial Information

Non-spatial Database: has the ability to store and access only Attributes/Non-Spatial Information

A spatial database provides special functions and indexes for querying and manipulating geospatial data using something like Structured Query Language (SQL)

Non-spatial database doesn’t provide such functions and indexes

A spatial database uses spatial query

Non-spatial database don’t support spatial queries

Examples

Examples

1. SpatiaLite

1. SQLite

2. Oracle Spatial/Locator

2. Oracle

3. MySQL Spatial

3. MySQL

4. PostGIS

4. PostgreSQL

5. DB2 Spatial

5. IBM DB2/Informix

6. MsSQL Spatial

6. MsSQL server

7. Spatial Hadoop

7. Hadoop 8. MS Access

sired application delays because first user needs to download, decrypt, and finally process the geospatial big data. Furthermore, users lose the power to search their geospatial big data, which is the most important task considered for the growing geospatial data. Again, traditional encryption techniques are well appropriate for non-spatial data, as spatial data possesses some unique different properties. Difference between spatial data and non-spatial data is shown in Table 1. As per Forrester’s analysis shown in Fig. 2, IoT encryption seems to be attained a significant success in past and predicted 3–5 years to reach to full pick level i.e. at equilibrium phase. Moreover, [15] suggested all IoT encryption must be go through the full encryption key lifecycle managing processes, because poor key management will reduce overall security. Rest of the paper is organized as follows. Section 2 discusses the back ground of the work, discussion on system model is given in Sect. 3, Sect. 4 discusses the different indexing structure used for geospatial data, and Sect. 5 discusses the different encryption methods required for storing data on cloud followed by proposed work. Experimental setup is given in Sect. 6, threat model in Sect. 7. Section 8 discuss security analysis with the help of attack model and lastly

Fig. 2 Forrester’s analysis [16]

Sect. 9 gives the experimental result followed by conclusion and future scope.

123

S. V. Limkar, R. K. Jha

2 Background of research Need of cloud storage is unavoidable for the dramatically growing IoT data, not only for storage purpose, but also to make this data available and easily accessible to other devices and services. As it is not easy nor even lucrative for number of companies and organizations to maintain a huge amount of data on their local servers. Thus, it is very common to see companies and organizations, outsourcing their data including spatial big data to public cloud providers such as Amazon [13] and Google [14] cloud. However, the violation of privacy and security incidents in the cloud are observed, outsourcing data to the cloud leads to compromising the security and privacy [29] and also it is very easy for an inside attacker (e.g. cloud administrator) to compromise the security. For example, compromising the spatial datasets outsourced by Foursquare through the transgression of Amazon Web Services (AWS) would put in risk millions of users’ private location data. organization want to compute on the data, the data must be first downloaded and decrypted, resulting not utilizing the advantage of cloud computing. In order to utilize the advantage of the cloud computing, companies must use homomorphic encryption [79]. It is a method of encryption which permits computations on encrypted data, without decrypting it. Searchable Encryption [17] is an efficient approach to facilitate search functionalities on cipher text over public cloud without decryption. Specifically with searchable encryption scheme, user can get accurate search results from an honest-but curious server without disclosing private data or queries. A various searchable encryption techniques [18– 26] have been proposed, most of them emphasis on common SQL queries such as keyword search, which are not suitable for geospatial data, as we have already mentioned normal dataset and geospatial dataset are having different properties. In recent years, a few searchable encryption techniques [27–31] have drawn their attentions particularly to geometric range queries over spatial datasets, where a geometric range query can be executed such as to retrieves points inside a geometric shape i.e. circle or a polygon [32]. Researchers in [28] able to execute circular range query on encrypted geospatial data using a symmetric-key and verified its security under selective chosen-plaintext attacks. Limitation of this scheme is, it can execute only circular range query. Researchers in [30] designed a symmetric-key searchable encryption scheme that can support geometric range queries on encrypted spatial data, that too applicable to any general geometric shape in order to overcome the limitation of [28] and verified its security under selective chosen-plaintext attacks.

123

Researchers in [31] proposed the concept of geometrically searchable encryption FastGeo to boost search time over 100 times by indexing spatial data in to hash table. Researchers in [33,34] proposed a method where two parties can privately compute and check whether a particular point is inside a geometric shape or not based on secure multiple party computation. Similarly, authors in [35,36] proposed a method where two or more users can securely verify whether one user is inside circle, which can help two users to securely confirm whether one user is inside a circle of another user based their locations, is also based on Secure Multi-party Computation [37]. Researchers in [38] proposed a novel machine to machine reputation system which enables machines to securely compute the global trustworthiness of machines in a complete decentralized and that too preserving privacy. The proposed model ensures that the machines cannot learn anything about the participating machines other than the aggregated global reputation score which is not privacy sensitive. Researchers in [39] proposed a novel system for the effective spam detection and blocking of spammers with the help of the decentralization and homomorphic cryptography. Proposed system is easily scalable to handle large number and also improves the rate of detection of spam and block the spammers with less overhead and very small communication cost. Researchers in [40] proposed a decentralized reputation aggregation protocol that enables multiple telecommunication operators (OPs) to actively take part in a collaboration process in the absence of trusted third party centralized system and without developing a predefined trust relationship with other OPs. With the exchange of cryptographic scores among OPs collaboration in between multiple Ops is achieved. Researchers in [41] presented a survey of primary security issues for IoT and categorize most popular security issues with respect to layered architecture of IoT. It also outline the security requirements for IoT along with the existing attack models and threats with its standard solutions. It also discuss the blockchain technology, which solve most of the IoT security problems. Above work [32–36] are based on secure multi-party computation, which require extensive rounds of interactions amongst all the parties involved in the computation. However, none of the above techniques cannot support execution of spatial queries over IoT generated data and is still remains open challenge. Limitation of existing work are listed out below. (i) Key management is tedious task Techniques mentioned in [28], [30], [31], are supported for private data produced by only one device. For e.g. if number of data owners are 10, then data owners need to acquire

Computing over encrypted spatial data generated by IoT

10 different tokens for encrypting there query, cloud server works on 10 different data owner and return 10 different result, which takes more time for searching and collecting the result from10 different data owners, ultimately results into more utilization of network bandwidth and cloud resources. Managing a key amongst number of private data producer is tedious task. These techniques are not suitable for IoT devices, as in case of IoT, billions of devices i.e. data producers exist. (ii) Based on conventional encryption Conventional encryption is not fast as homomorphic encryption because homomorphic encryption is based on pure mathematical operations. (iii) Based on two dimensional dataset All existing papers are based on two dimensional spatial data i.e. latitude and longitude. Whereas we considered three dimensional spatial data i.e. latitude, longitude and altitude.

2.1 Contributions The first optimal goal of this paper is to provide at most security to IoT generated geospatial data while it is in transit from edge devices to cloud and from third party cloud i.e. public cloud. This is accomplished by using homomorphic encryption. The second optimal goal of this paper is to reduce the time required for indexing the encrypted geospatial data generated from IoT devices for efficient processing spatial queries on the cloud. This is accomplished by using most advanced open source parallel architecture framework, i.e. MapReduce and Apache Spark and the third optimal goal of this paper is to reduce the execution time required for executing spatial query on encrypted geospatial data. Our technical contributions in this paper can be summarized as follows. I. Proposed an efficient parallel architecture framework for processing real time encrypted geospatial big data generated by internet connected devices for the aviation industry. This is achieved through indexing encrypted geospatial big data in R tree and its variants. II. Proposed framework is simulated in IoTZetta architecture, which eliminates the existing issues in IOT. III. Proposed a novel algorithm for parallel merging of two R-tree and its variants using the two frameworks i.e. Apache MApReduce and Apache Spark framework. IV. We experimentally evaluated and compared our proposed algorithms in terms of execution time, insertion time, update time, search time and scalability with respect to normal IoT data (i.e. non-encrypted IoT generated Geospatial big data). We used various metrics to measure the quality of the resulting trees.

V. To the best of our knowledge, this is the first paper which exploit the advantages of Apache Spark over MapReduce framework for encrypted geospatial big data processing with in-memory computation and ondisk computation. VI. To the best of our knowledge, this is the first paper which exploit the advantages of homomorphic encryption for IoT generated real time spatial data. Rest of the paper is organized as follows. Discussion on system model of proposed work is given in Sect. 3, which covers comprehensive survey on various IoT platforms, issues in existing IoT Platforms, Simulation of proposed work in IoT Zetta, IoT Zetta component stack. Section 4 discusses the different indexing structure used for geospatial data. Section 5 discusses the different encryption methods required for storing data on cloud followed by proposed work. Experimental setup is given in Sect. 6. Section 7 discuss about the theoretical analysis of the proposed system and lastly Sect. 8 gives the experimental result followed by conclusion and future scope.

3 System model for proposed framework Various IoT platforms [42,43] are studied and summarized in Table 2 with its features. We selected IoTZetta architecture because it overcomes the issues in existing IoT, which are described in Table 3.

3.1 Zetta architecture [52] Zetta is an open source platform completely developed in Node.js [55]. It is mainly used for creating IoT server which can run anywhere across the geographically scattered computers. Zetta is the amalgamation of three technologies i.e. websockets, reactive programming and REST API. Such combination is highly suitable for grouping large no. of devices at one place for real time application such as smart home and smart farming. Zetta is a lightweight process that resides in more than one place such as on the edge of the network, on a hardware hub, Raspberry Pi, BeagleBone, data center or on a cloud server. Zetta will mediate protocols in the hardware hub, i.e. nothing but separation between API technology and Device protocol technology. Zetta process create a secure link between each other using Z2Z protocol. Zetta maintains a layered approach, keeping components pluggable and extensible. Zetta is developer friendly because it is based on the API.

3.2 Simulation We simulated aviation industry application in Zetta. Overall simulation environment comprises three modules; such as

123

S. V. Limkar, R. K. Jha Table 2 Summary of various IoT platform Platform

Features Proprietary things

Open things

Private cloud

Xively [44]

ThingSpeak [45]

IoT Toolkit [46]

n/a

Open.Sen.se [47]

openPicus [48]

NimBits [49]

Public cloud

n/a

Free

Application API

Last update

Ready to use

n/a

n/a

n/a

Dec 13

n/a

Open source

n/a

n/a

n/a

n/a

n/a

n/a

Partly

n/a

Mar 14

NetLab [50]

n/a

Dec 13

OpenIoT [51]

Dec 14

IoT Zetta [52]

Apr 17

IoTCloud [53]

n/a

n/a

May 13

Table 3 IoT Issues with its description’s and solution provided by Zetta architecture S. no.

Issues in IoT

Description

Solution provided by Zetta

1

Large number of protocols and its interfaces are available [42,43]

Existence of large number of protocols and its interfaces, amongst them which one is to use? Whether a provision is made for standard future proofing and security provision against these new protocols or not made is unknown

Use of HTTP protocol to communicate with all devices. Every device gets an API generated from Node.js

2

IoT Consist of large number of devices [1,2]

Large no of devices are involved in IoT system, there is necessity to have a standard mechanism to coordinate these large number of devices by providing various types of communication such as D2D: Device to Device, D2G: Device to Gateways, D2C: Device to cloud, S2S: server to server communication in large IoT systems and must improve on emergent behaviors of these devices in large systems

With the use of REST API, web sockets and reactive programming, leads to easy coordination of devices across the globe

3

IoT Devices generate huge amount of data [54]

In upcoming years IoT will become the biggest source of data on the planet and we must be prepared for handling such huge data. We must address issues related to IoT Data such as, which data need to be stored, what techniques we must adopt for storing, how to extract meaningful information and learn from this IoT Data

Broadcasting data over web sockets made data analyzing more easy

IoTZetta Clients, IoTZetta Server and IoTZetta Cloud Server. All these modules are discussed in details.

3.2.1 IoTZetta clients Skies around the world are increasingly crowded because of rising number of flights. Automatic dependent surveillance— broadcast (ADS-B) enables the widespread use of satellite based GPS technology in the navigation. It provides Air

123

traffic control (ATC) information that help aircraft pilot to efficiently navigate in increasingly congested space. ADS-B allows aircrafts to determine its exact location using Global Positioning System (GPS). This received information is broadcast to other nearby aircrafts and ADS-B receivers, which relay the information to ATC. To make this possible, aircraft will be required to continuously broadcast their position, i.e. through ADS-B Transponder and optionally additional equipment that enables the aircraft to receive those from another plane as well as information from ground

Computing over encrypted spatial data generated by IoT

(i) Driver It facilitates aircrafts to be represented in terms of a state machine and facilitates interaction through APIs such as query devices to get real time sensor data streams, inter-

ADS-B Receiver

ADS-B Receiver

Driver Server Extensions

Scouts

Zea Server Clients Registry

Key Pair Generator

IoT Zea Cloud Server

GPS Satellite

ADS- B receiver

ADS - B Transponder equipped aircra

Fig. 3 Proposed system model for real time data encryption

Flightradar24 Web Streaming

Driver Server Extensions Scouts Kay Pair Generetor

Clients Registry

IoT Zea Server HTTP API (JSON) & Websocket

It is an uppermost level of abstraction in Zetta. It runs on any hardware such as Intel Edison, Raspberry PI, etc. It generates APIs for each devices to whom it has to connect and coordinates interaction amongst all devices. Mainly it consist of the following components.

ADS-B Receiver

HTTP API (JSON) & Websocket

3.2.2 IoTZetta server

IoT Zea Clients

IoT Zea API

stations. This continuous instantaneous cycle to broadcast give ATC and pilot with complete and more up to date picture of the airspace. ADS-B provides updates 12 times more frequently than the radar, which means better coordinate for takeoffs and landings, less time on the ground or circling in the air, nor directs routs and fewer delays. Increased airspace efficiency is another cheap benefit of ADS-B. Earlier systems are limited to, land based areas, leaving 70% of the world area uncovered creating inflexible and inefficient routing over much of the world’s aerospace. These global blind spots disappear in the beginning of year 2017. Currently, ADS-B receiver depends on satellite relay signals from all ADS-B equipped aircrafts in the ATC worldwide. This provides new real time visibility of ADS-B equipped aircraft in any flight information region, including current surveilled ocean, polar, dessert and not necessary space. Satellite based ADS-B will enable the optimization of the aircraft’s path, increase operational and fuel efficiency for airlines and significantly enhance safety all in less investment and in the less infrastructure. With the adoption of satellite based ADS-B surveillance no aircraft will ever be lost again. In this module aircraft is connected to the internet through ADS-B transponder and ADS-B receiver. ADS-B Transponder is installed in the aircraft for receiving its current location information from a constellation of GPS Satellites and sending its current location to ADS-B receiver. The ADS - B receiver is used for real time tracking of aircraft position from the base station. The ADS - B receiver is connected to the Zetta server though IoT API. We simulated two types of IoTZetta clients. In first scenarios we used three physical ADS-B receiver which could receive the real time aircrafts location update information within its range but it could fetch only few aircrafts. First Simulated scenario is shown in Fig. 3. The second scenario is simulated, in order to get more number of aircrafts encrypted real-time position updates. For this we used real time database provided by global flight systems, i.e. flightradar24 [56]. It provides real-time position updates of thousands of aircrafts across the world. Second simulated scenarios are shown in above Fig. 4.

IoT Zea Cloud Server

Fig. 4 Proposed system model for real time data encryption

123

S. V. Limkar, R. K. Jha

act with aircrafts to know its current status and locations, provides links between servers, which establishes a secure connection between two Zetta servers for this it uses Z2Z protocol for efficiently proxy API requests, and streaming data between servers. (ii) Scouts It mainly involves in the discovery mechanism, i.e. searching for any new aircrafts and if it founds any new aircraft in the sky then it report back to the server. (iii) Server extensions These are the pluggable models required for adding additional APIs and providing additional securities to APIs. (iv) Registry It is a small local database which holds information about all the aircrafts which are in the sky and the server itself. (v) Key pair generation It mainly involves in generating key pairs for the homomorphic encryption and decryption.

Application Layer

JSON API

Mediation Layer

IoT Zetta Drivers

Protocol layer

Using ADS-B Device Specific protocol

Network and Transport Layer

Using Serial (Web API)

3.2.3 IoTZetta cloud server IoTZetta server is connected to the IoTZetta cloud server through HTTP APIs (JSON) and websocket. Mainly it holds all the real time encrypted spatial data received from IoT Zetta server for further processing and doing analytics.

3.3 IoTZetta components IoT Zetta components are stacked in five layers. In physical and data link layer ADS-B receiver is used to receive the aircraft’s real time data in its range through universal access transceivers (UAT) and 1090 MHz extended squitter. The data which is received at physical and data link layer is forwarded to network and transport layer using serial communication and web APIs. Protocol layer uses ADS-B receiver specific protocol, whereas mediation layer uses IOTZetta drivers. Application layer composed of JSON APIs. Details IoT Zetta components is shown in the Fig. 5.

4 Realization of diﬀerent indexing structure for proposed framework In order to index the IoT generated encrypted spatial data, we used R tree [57] and its variants [58,59]. These data structures have emerged as practically most efficient indexing methods and widely accepted, adopted in geospatial data management. Most popular spatial indexing techniques are summarized in Table 4.

123

Physical and Data Link layer

Using Universal Access Transceiver (UAT) & 1090 MHz Extended Squitter

Fig. 5 IoTZetta components

4.1 Introduction to R tree R trees [57] are similar to B+ trees, which is a height balanced tree. It consist of three main parts i.e. root node, internal node and leaf node. Root node holds the pointer reference to the larges region(s). Internal nodes hierarchically stores pointers to their child nodes for storing data bounded with rectangles parent’s area. Leaf nodes are used to stores the actual data with respective minimum bounding region (MBR). MBR is basically a sub region within the entire space that group data efficiently. Just like in B+ trees, in R tree each node contains at least m and at most M entries, where m ≤ M/2. R tree variants are evolved for improving the performance by fine-tuning some parameters, such as in R* trees [59], which is widely accepted for achieving the best performance, a number of parameters were introduced such as forced re-insertions during insertions, buffering and optimization criteria for splitting/merging nodes and adjusting the complex MBRs. The comprehensive comparison amongst R tree and its variants is given in the Table 5.

Computing over encrypted spatial data generated by IoT Table 4 Summary of various spatial indexing techniques Characteristics

Major spatial indexes Grid file [60–62]

Quad tree [63,64]

R-tree [57,65]

Partition technique

Space-based

Space-based

Data-based

Handling skewness

Very poor

Moderate

Very good

Complexity

Index creation and tuning is more complex

Tuning is more complex

Index creation and tuning are easier

Storage

More storage is required

More storage is required

Less storage is required

Whole-earth model

Possible

Not-possible

Possible

Hierarchical structure

Absent

Present

Present

Multidimensional

No

No

Yes

Query processing

Less efficient

Less efficient

More efficient

Parallelization friendly

Substantial

Moderate

Moderate

Table 5 Comprehensive comparison amongst R tree and its variants S. no.

1

Characteristics

Overlap

R tree and its variants R tree [57]

R+ tree [58]

R∗ tree [59]

More overlap

Minimum overlap as compared to R tree

Minimum overlap as compared to R tree and R+ tree

2

Pruning

Less pruning

Minimum pruning

Improved pruning

3

Dead space

More dead space

Minimum dead space compared to R tree

Minimum dead space compared to R tree and R+ tree

4

Advantages

Construction and maintenance of R trees is simple

Nodes are not overlapped with each other, better performance of point query

Minimum Coverage and minimum overlap

Nodes are overlapped with each other 5

Disadvantages

R tree are smaller than R+ tree built over the same data

1. Since rectangles are duplicated, R+ tree can be larger than R tree built over same data 2. Construction and maintenance of R+ trees is more complex than R tree

Construction and maintenance of R∗ trees is more complex than R tree and R+ trees

6

Coverage

More coverage

Minimal than R tree

Minimal then R and R+ tree

5 Realization of diﬀerent encryption scheme for proposed framework To achieve the security of data, encryption is the best method. Encryption is defined as converting data from one form to another form i.e. from plane text to cipher text. However, the traditional encryption methods won’t work on the cipher text without converting it into plain text. The user have to sacrifice the privacy in order to make use of cloud services. Furthermore, untrusted public cloud service providers or cloud operators can keep watch on user’s data after users end the association with the service providers [66].

Different evaluation criteria’s [66] considered for comparison of several encryption methods used in the cloud for data confidentiality are listed below: a. Confidentiality of data: Any unauthorized user should not come to know about the information of encrypted data. b. Fine-grained access control: Each individual user linked with different access rights c. Scalability: System must be scalable and efficiently work even when the number of authorized users increases. d. User responsibility: When authorized user disclose his credential i.e. key to unauthorized users then authorized user must be responsible. e. User revocation: If authorized user leaves the system, then his/her private key should be revoked and access

123

S. V. Limkar, R. K. Jha Table 6 Comparison of various encryption scheme used for data confidentiality in the cloud Criteria

Homomorphic encryption [67–72]

Data confidentiality

×

Fine grained access

Scalability

×

×

×

User revocation

×

User rejoin

×

Collision resistance

Fast

×

×

×

User responsibility

×

×

×

Cipher text size

Larger

Larger

Larger

f.

g.

h. i.

ABE [73, 74]

KP-ABE [75]

to the data denied. This must be done without affecting existing users in the system. User re-join: Ability of considering the authorized user back into the system after revocation, without affecting existing users in the system. Collusion resistant: At any cost encrypted data files cannot be decrypted by combining the attributes of authorized users. Fast: This refers to the time required for performing encryption on plaintext. Cipher text size: This refers to the size of the generated cipher text.

Table 6 shows the comparison of various encryption scheme used for data confidentiality in the cloud.

5.1 Homomorphic encryption In this section, we explain basics of the Homomorphic Encryption (HE). Then, we present notable difference in, Partial Homomorphic Encryption (PHE), Somewhat Homomorphic Encryption (SWHE) and lastly Full Homomorphic Encryption (FHE) schemes. With the help of Homomorphic Encryption (HE) public cloud can able to perform certain computation on the encrypted data while protective the privacy of the data. This section also discuss, how this encryption is applied to our proposed method. A HE [85] method is mainly categorized by four operations such as Key Generation, Encryption, Decryption, and Eval function. Key Generation is the procedure for generating a pair of secret and public key for asymmetric form of HE or a single key for symmetric form of HE. Basically, the operations of key generation, encryption and decryption are same as conventional encryption methods. However, Eval is a HE-specific operation, whose input is cipher texts and also outputs is a cipher text which is exactly corresponds to plain text. Eval allows the function f () above the cipher texts with-

123

CP-ABE [76]

ABE W. non monotonic [77]

HABE [78]

AES [79]

HIBE [80–84]

×

×

×

×

×

×

×

×

×

×

×

×

×

×

Larger

Larger

Larger

Larger

Larger

out seeing the actual messages. The most important task in HE is preserving the format of the cipher text after completing the Eval function operation along with this size of the cipher text should be constant in order to support unlimited number of operations. Else more resources will be required resulting in to limited number of operations. Basic steps involved in homomorphic encryption is illustrated in Fig. 6. It consist of data owner, data user and cloud. First data owner encrypts his message m, Enc(m) and send it to cloud. If data user send search f () if he want to get some data from the encrypted data which is stored on the cloud. Then cloud evaluates f (m) homomorphically and returns the Enc(f (m)) to the data user. Data user computes Dec (Enc (f(m))) = f(m), and recovers f(m). In our proposed work IoT Zetta server generate key pair and then encrypt received IoT data and send it to the public cloud. When IoT data owner wants to perform some computation function over encrypted data, he sends the compute function to public cloud. After receiving compute function public cloud performs homomorphic operation on encrypted IoT data using the compute function and return the encrypted result to IoT data owner. Finally, IoT data owner recovers his data with his own key and obtain the actual data. Here compute function on the public cloud side does not require the private key of the IoT data owner and it allows numerous operations i.e. addition and multiplication on the encrypted data. Moreover, homomorphic encryption is expensive in terms of heavy computations and complicated mathematical operations involved in it. However, to overcome this various HE schemes in the literature found and are precisely categorized in three main categories with respect to the number of permitted tasks on the encrypted data such as: (1) Partially Homomorphic Encryption (PHE) [86–92] permits only one type of operation i.e. either addition or multiplication with an indefinite number of times. (2) Somewhat Homomorphic Encryption (SWHE) [93–96] permits some types of operations with a limited number of times. (3) Fully

Computing over encrypted spatial data generated by IoT

1. Encrypts his message m, Enc(m) and send it to cloud

3. Evaluate F() homomorphically

2. Search F() 5. Compute Dec (Enc( F(m))) =F(m), and recovers F(m)

4. Returns Enc(F(m)) Data Owner

Cloud

Data User

Fig. 6 A simple data owner, data user and cloud HE scenario

Homomorphic Encryption (FHE) [97–100] allows unlimited number of operations with unlimited number of times. All the above categories are summarized in the Table 7. The best part of PHE is that mathematical operation does not require the encrypted values to be decrypted or reveal during the entire process. The data remains confidential to the database operator or cloud service providers, who are operating on it and even if anyone come to know about the data then the compromised data gives only encrypted values. PHE techniques can achieve either addition or multiplication but not both. SWHE and FHE techniques are still not efficient in terms of memory usage, computation cost and costly bootstrapping operation [97]. One more significant limitation of FHE schemes is its memory requirement; ciphertext and public key sizes are very large in order to guarantee sufficient security to prevent against possible attacks. One more limitation of FHE technique is bootstrapping operation is very costly and it is required to reduce the noise that is incurred at the time of homomorphic operation performed on ciphertext. In our proposed method we used PHE Schemes i.e. Paillier [92]. Following three phases are involved in it.

For every message p, the number t is randomly chosen and the encryption works as follows: c = E( p) = f p t d (mod d 2 ) c. Decryption process For a proper ciphertext c < n 2 , decryption is

D(c) =

H (cλ (mod d 2 )) mod d = p, H ( f λ (mod d 2 ))

(2)

where private key pair is (a, b).

Homomorphic Property is E( p1 ) ∗ E( p2 ) = ( f p1 t1d (mod d 2 )) ∗( f p2 t2d (mod d 2 )) = f p1+ p2 (t1 ∗ t2)d (mod d 2 ) = E( p1 + p2).

a. Zetta server It is responsible for computing (Public, Private) Keys pair: First it chose large primes a and b such that gcd (ab, (a − 1)(b − 1)) = 1, than it compute d = ab and λ = lcm (a − 1, b − 1). Then, it selects a random integer f X d∗2 by inspecting whether gcd (d, H ( f λ mod d 2 )) = 1, where the function H is defined as H (u) = (u − 1)/d for every u from the subgroup X d∗2 which is multiplicative subgroup of integers modulo d 2 instead of d. Lastly, public key is (d, f ) and the c is (a, b) pair. b. Encryption process Data, which is collected from ADS-B Receiver/ flightradar24 are encrypted.

(1)

(3)

Above show that Paillier’s partial homomorphic encryption scheme is homomorphic over addition. Along with this it allows extra basic operation on plaintext p1, p2 ∈ xd∗2 by using the encrypted plaintext E( p1), E( p2) and public key pair (d, f ): E( p1) ∗ E( p2) (mod d 2 ) = E( p1 + p2 (mod d)) E( p1) ∗ f

p2

(mod d ) = E( p1 + p2(mod d)) 2

E( p) (mod d ) = E(kp (mod d)) k

2

(4) (5) (6)

123

S. V. Limkar, R. K. Jha Table 7 Summary of homomorphic encryption schemes S. no.

Homomorphic encryption schemes

Evaluate function

Hardness of the scheme

Homomorphic property

1

PHE [86–92]

Permits only one type of operation with an unlimited number of times.

RSA [86] is based on the factoring problem of the product of two large prime numbers [101]

Only multiplication

Goldwasser-Micali [87] is based on the quadratic residuosity problem [101]

Only addition

Elgamal [88]is based on the certain problems in discrete logarithm [101]

Only multiplication

Benaloh [89] is based on the higher residuosity problem [101]

Only addition

Paillier [92] is based on the composite residuosity problem [101]

Addition and some extra property such as scalar multiplication

Young [93] is based on garbled circuit Sander [94] is based on NC1 circuit

Arbitrary Polynomially many AND and one OR/NOT gate.

Boneh [95] is based on the subgroup decision problem.

Unlimited addition and only one multiplication

Ishai [96] is based on the branching programs

Arbitrary

Gentry [97] is based on the Ideal Lattice. Van Dijk [98] is based on the Approximate- Greatest Common Divisor (AGCD) problems [102]

Unlimited number of addition and multiplication.

2

3

SWHE [93–96]

Permits some types of operations with a limited number of times.

FHE [97–100]

Permits unlimited number of operations with unlimited number of times.

Brakerski and Vaikuntanathan [99] is based on learning with error.

Table 8 Sample extracted data of single flight position Aircraft ID

SG512

Aircraft Source airport

Aircraft destination airport

Aircraft current position

Latitude

Longitude

Latitude

Longitude

Latitude

Longitude

Altitude

17.254

78.431

23.07724

72.634651

22.39902

74.91647

3600

22.36407

75.0325

3600

22.33255

75.13809

3600

22.29863

75.2504

3600

22. 29765

75.2104

3600

22. 29548

75.2711

3600

22. 28713

75.2394

3600

Above Eqs. (4), (5), (6) shows, in what way the operation computed on encrypted data affects the plaintexts. Applying the encryption to the Table 8. Resulting encrypted values are shown in the “Appendix” section.

5.2 Proposed work The proposed work is divided into four stages i.e. input dataset, data partitioning, parallel architecture and lastly parallel tree construction which is shown in Fig. 7.

123

5.2.1 Input dataset We consider a set of moving aircrafts in a three dimensional space. The rough goal is to make knowledge about their real time current locations and keep the information up-to-date. The aircrafts are uniquely identified through a field called aircraft ID and their positions are represented in 3-dimensional space, i.e. in the form of a coordinates Latitude, Longitude, Altitude represented in the form of x, y, z coordinates. The data received from the moving aircraft are encrypted using partial homomorphic encryption scheme and this encrypted data is indexed into a 3-dimensional R tree and its variant’s

Computing over encrypted spatial data generated by IoT Fig. 7 Proposed work

1 Input

2 Data Paroning

3 Parallel Architecture Used

4 Tree Construcon

R Tree

Chunk-1/ RDD-1

Chunk-2/ RDD- 2

R+ Tree

( On Disk)

Encrypted Spaal Data Chunk-3/ RDD- 3

R* Tree

( In Memory)

Chunk-4/ RDD- 4

i.e. R+ tree and R* tree. Indexing is made in order to get the efficient results of spatial queries over encrypted spatial data and real time updates of aircrafts positions. ADS-B transponder broadcast data twice in a second. The data which we receive contains lots of information, but we are fetching and feeding only data related to position updates, i.e. Latitude, Longitude and Altitude with respect to each aircraft which is uniquely identified by its unique ID called as aircraft ID. Figure 8 shows a flowchart of fetching aircraft data from ADS-B receiver which is used in the first simulation scenario. Following Fig. 9 shows a flowchart for fetching aircraft data from flightradar24.com which is used in second simulation Scenario.

Flightradar24 dataset [56] In Fig. 10, blue colored, highlighted text depicts its flight ID, dark red colored depicts aircraft destination latitude and Longitude, purple colored depicts aircraft source latitude and Longitude, green colored depicts trail, which give the exact real time position of the aircraft and it gets updated twice in a second. As we receive enormous data from flightradar24. This whole data is not required for our proposed work, so we fetched only required data which is shown in Table 8, the last three columns are continuously updating for twice in every second. Received geodetic latitude (x), ¯ longitude ( y¯ ) and altitude (¯z ) which is of 2-by-1 vector need to be converted into 3-by-1 vector of Earth-centered, Earth-fixed (ECEF) position (a) ¯ 3by-1 vector [103–105]. The ECEF position is calculated from the geocentric latitude at mean sea-level (γs ) and longitude using:

Start

Detects ADS-B signal

Zea Driver decode the received ADS-B packet

If packet type is airbone msg

If packet type is idenﬁcaon

N

Y

N

Y

Retrieve Latude, Longitude, Altude

Retrieve the ﬂight source & desnaon informaon

Convert this info into JSON format

Apply Paillier’s Paral Homomorphic Encrypon

Upload above Encrypted data to Zea Server

Stop

Fig. 8 Fetching aircraft data from ADS-B receiver

123

S. V. Limkar, R. K. Jha

as equatorial radius, with following equations.

Start

γs = a tan((1 − p)2 tan x) E2 ks = 1 + (1/(1 − p)2 − 1)sin2 (γs )

Collect the data from ﬂightradar24.com through web streaming API

(8) (9)

5.2.2 Data partition Parse received data into ﬂight ID, Latude, Longitude, altude, source, desnaon

Convert data into JSON

Apply Paillier’s Paral Homomorphic Encrypon

Upload above Encrypted data to Zea Server

Sleep for 100 ms

Fig. 9 Fetching aircraft data from flightradar24.com through web streaming

⎤ ⎤ ⎡ ks cos γs cos y + z cos x cos y al a¯ = ⎣ am ⎦ = ⎣ ks cos γs sin y + z cos x sin y ⎦ an ks sin γs + z sin x ⎡

(7)

Flattening ( p) ¯ is used to define the geocentric latitude from ¯ mean sea-level and (ks ) as the radius at a surface point, ( E)

Various geospatial data partitioning techniques exist [106– 115]. In order to efficiently process large volume of geospatial data on multiple machines, we need to divide this dataset into small equal pieces and each individual piece need to be processed on individual machines. The performance of parallel computing is totally based on the quality of partition. To achieve good load balancing there is a highly necessity of implementing good partition strategy. Basically, there are two main types of partitioning strategy exist, i.e. space oriented and data oriented. In space oriented partitioning techniques generates sub partitions by spatially decomposing the current partition boundary into two equal subspaces. Decision to partition the data is purely based on the space, which suffers from data skews. Data oriented partitioning technique generates sub partitions by finding a cut such that each resulting sub partition contains roughly the same amounts of data. In our scenario we used second approach, i.e. data oriented partition. The following algorithm is used to extract the aircraft’s position dataset and partition it into a number of small parts so that we can build the index of R trees in a parallel way. Two inputs are required for this, first one is the JSON Data and the second one is the number, which specifies how many parts we need to divide this dataset. Output is the main JSON dataset partitioned into a number of parts. This partitioning algorithm is common in both the parallel framework i.e.Hadoop and spark data partitioning.

{ flight: "SG512",snapshot_id: null,status: "airborne",dep_schd: 1461416100,arr_schd: 1461422100,departure: 1461416572, arrival: 1461421824, eta: 1461421824, to_iata: "AMD", to_city: "Ahmedabad, Ahmedabad International Airport",to_pos: [23.07724,72.634651], to_tz_code: "IST", to_tz_offset: "5.50", to_tz_name: "India Standard Time",from_iata: "HYD",from_city: "Hyderabad, Hyderabad Rajiv Gandhi International Airport", from_pos: [17.254,78.431], from_tz_code: "IST", from_tz_offset: "5.50",from_tz_name: "India Standard Time", airline: "SpiceJet",aircraft: "Boeing 737-7GL",airline_url: "http://www.flightradar24.com/ data/airplanes/sej-sej",image: "http://d31asmy75eposw.cloudfront.net/200/5/84101_1450424561_tb.jpg?v=0",imagelink: "http:// external.flightradar24.com/redir.php?url=http%3A%2F%2Fwww.jetphotos.net%2Fphoto%2F8159824",copyright: "BHASKAR GURUPRASAD",imagesource: "Jetphotos.net",image_large: "http://doygv6stby4r3.cloudfront.net/ image.php?params=U0ZHNjBaQzg0TnlvQWJLd3htODRqTldQWXFyWjYxM0M4T1dFamIrUmNJYnE1SlIxejY3NWRrUnAwSENoc DhUdWMvT2V1M1hVTmZ2OFlXSHNIdDVsN3hySFFLWDBGOURxN2FSdFU1VC8yeDdEaEVzdmpzRVdUaVFsYjQzRiszbDQ=", imagelink_large: "http://external.flightradar24.com/ redir.php?url=http%3A%2F%2Fwww.jetphotos.net%2Fphoto%2F8159824",copyright_large: "BHASKAR GURUPRASAD", q: "98086a0", trail: [22.39902,74.91647,3600, 22.36407, 75.0325,3600,22.33255,75.13809,3600,22.29863,75.2504,3600],first_timestamp: 0 } Fig. 10 A piece of JSON data received from Flightradar24.com for single flight position update

123

Computing over encrypted spatial data generated by IoT

Input: JSON Data, number of parts Output: array 1. 2. 3. 4. 5. 6.

Flight[] = fetch the flight records from raw JSON data. Count= Flight.Length/no of parts FlightArray[]= empty for (i=0;i < flight.length;i++) FlightArray[i % no of parts].add(flight[i]) Return flightArray;

5.2.3 Parallel architecture We worked on three different parallel architecture, i.e. Apache MapReduce, Apache Spark on disk computation and Apache Spark in-memory computation along with traditional single core architecture i.e. sequential architecture, these are explained as follows. (i) MapReduce framework [116] MapReduce is developed at google. The main intention behind the evolution of MapReduce framework is to support parallel processing across a large amount of data on clusters of commodity hardware [116]. With this framework, user can concentrate on computation part only, without worrying about parallelization. In this framework whole task is broken into many small parts of two types i.e. map and reduce. The mapper input is in the form of (key, value) pair, and generates intermediates (key, value) pair. Then, this framework combines the values with same intermediate keys. The reducer takes input as the same intermediate key, along with all the intermediate values which belongs to that key and generates a list of output values. Master of this framework manages all the process such as it activates the input file partitioning, assign mappers and reducer, coordinates the communication between mappers are reducers. Open source implementation of this framework is called as Hadoop [117]. Hadoop is the combination of MapReduce framework and distributes file system (HDFS). (ii) Apache Spark [118] MapReduce [116] is limited to batch processing, storm [119] is limited to real time stream processing, Impala [120] is limited to interactive processing, Giraph [121] is limited to graph processing. Every solution we need a specialized system which is very costly i.e. it requires various types of resources and team of developers, also it is very tough to manage. So there is a need to have a general purpose system which can handle all the requirements such as; powerful engine for processing data in real time and perform in memory analytics. Apache Spark is the powerful open source engine that provides real time, interactive, graph, in-memory as well as batch processing. The

main component of Apache Spark are spark context, cluster manager, worker node which is explained in details as follows. Spark context It is defined in the driver program. It is the main entry point of the spark application. It interacts with the cluster manager. It specifies how to access the cluster using this we can control the spark applications. RDDs are created using spark context only. Cluster manager Spark context interacts with the cluster manager and get the resources. Cluster manager keeps track of all the resources like, how many cores of CPUs, how much memory, how many executors are available in the overall cluster. Worker node Inside worker node executors is running. Executors are running inside the Java Virtual Machine (JVM) and there will be multiple task running inside the executors. In the diagram more than one task is running inside each executor. Task are the fundamental units of execution. Cache is the important property of the spark. It can cache the data, i.e. keeps the data in-memory. Cache is also used for inmemory computation. The complete architecture of Apache Spark is shown in Fig. 11. Difference between Apache Spark and MapReduce framework is shown in Table 9. Difference between traditional stream processing and Apache Spark stream processing gives clear idea of the Apache Spark which is given in Table 10.

5.2.4 Tree construction First we constructed R tree and its variants over encrypted geospatial data. Construction of minimum bounding rectangles i.e. MBR’s are not proper as the size of the ciphertext is too large. If we still want to construct the MBR’s then the size of the ciphertext gets truncated and resulting MBR’s not getting at appropriate size. So to overcome this, we decrypted only latitide, longitude and altitude values (we have not decrypted aircraft ID) for the merging of two R tree and its Variants. We proposed a novel algorithm for merging of two R tree and its variants, which works on reducer of both the frameworks i.e. MapReduce and Apache Spark, by using a plane sweep algorithm [122]. Consider there are two different R trees such as R_Tree and S_Tree. R_Tree MBRs are a1 , a2 , a3 , a4 where as S_Tree MBRs are a5 , a6 , a7 , a8 , a9 . In order to combine these two trees we need to find out the pairs of rectangles who are intersecting with each other. Figure 12 shows the Set P of the rectangle (MBR’s of R_Tree and S_Tree) which are aligned on the X-Axis. A plane sweep algorithm is used to get the pair of intersected rectangle.

123

S. V. Limkar, R. K. Jha

Cluster Manager

Spark Context

Worker Node

Worker Node

Executer

Cache

Executer

Cache

Task

Task

Task

Task

Datanode

Driver Program

Datanode

HDFS

Fig. 11 Architecture of Apache Spark

Input: Set P of the rectangle which are aligned on the X-Axis Output: Pair of intersected rectangles For every pair (ai , a j ) of rectangles S, i = j If (ai ∩ a j = Φ) then report (ai , a j ) Output: (a1 , a2 ) (a1 , a3 ) (a1 , a8 ) (a3 , a4 ) (a3 , a5 ) (a3 , a9 ) (a4 , a5 ) (a7 , a8 ) Here the output is in two groups, first group is the pair of rectangles who are intersecting with each other and second group is the rectangles who are not intersecting with another. Once we find the pair of rectangle, then we combine those rectangles to form a larger MBR, if following condition get satisfied. m≤n≤M where n is the number of aircrafts. If the above condition is not satisfied, then check whether n > M, if yes, then call either condense or split algorithm of R tree and its variants. Then build R tree and its variants using a bottom up approach, while merging MBR’s check condition m <= M at each level of the tree. Insert unpaired rectangles MBR’s using a standard insertion algorithm of R trees and its variants [57–59] Hence, in this way we will get final R tree and Its Variants.

123

Pseudocode Step 1 Receive encrypted data and partition it into equal size of a chunk’s and RDD’s and store it into HDFS. Step 2 If encrypted data is partitioned into chunks, then call MapReduce else call Apache Spark in-memory or Apache Spark On-Disk computation. Step 3 Workers i.e. mappers are assigned to each chunk’s and RDD’s. Step 4 Each mapper first decrypt the received encrypted data (only Lat, Long and Alt). After decrypting each mapper constructs R tree and its variants from received data using an existing standard algorithm [57–59]. Step 5 Workers i.e. reducers are assigned to merge the trees. Step 6 Find a pair of leaf nodes MBR’s, who are intersecting with other leaf nodes MBR’s using plane sweep algorithm. Step 7 For set of paired rectangles who satisfies the condition m ≤ n ≤ M, merge these rectangles and form a new MBR. Step 8 If condition in earlier step is not satisfied, then check n > M, if this condition satisfies then call condense / split algorithm of R tree and its variants. Step 9 Build R tree and its variants using a bottom up approach, while merging MBR’s check condition m <= M at each level of the tree. Step 10 Insert all unpaired MBR’s one by one using a standard insertion algorithm of R trees and its variants.

Computing over encrypted spatial data generated by IoT Table 9 Difference between Apache map reduce and Apache Spark S. no.

Characteristics

Apache Spark

MapReduce

1

Speed

It is superfast cluster computing tool. It executes applications up to 100× faster in memory and 10× faster on disk compared to Hadoop MapReduce. This is achieved by minimizing the number of read/write cycle to disk and intermediate data is stored in memory.

It does not leverage maximum utilization of memory of the Hadoop cluster, that’s why it reads and writes the data from disk leading to decrease the processing speed

2

Difficulty

It is easy to program because it has lots of high level operators with Resilient Distributed Dataset (RDD), which makes it very easy to work

It needs to write a hard code for each and every single task, which makes it very difficult to work

3

Easy to manage

It can perform batch processing, interactive machine learning and streaming all inside the same cluster. No need to manage different modules as requirements arise

Designed to perform only batch processing application. So it depends on other modules such as storm, giraph, impala, etc. Resulting it into very complex system to manage multiple components

It is very easy to manage multiple components 4

Real time analysis

It can process real time data. Its asset lies into efficiently processing live streams as it arrives in real time with zero or no latency

It fails to process real time data, as it was intended to accomplish batch processing on big data

5

Latency

It provides almost zero latency computing framework. As it involves in memory computation

It is a very high latency computing framework. As it involves on disk computation

6

Interactive mode

Interactive data can be processed

It does not support interactive mode

7

Storage used

Memory/disk

Disk

8

Streaming

Spark streaming is used to process real time data

It can only support batch processing mode

9

Ease of use

It is very simple and easy to use; its simple abstraction, makes user to process data using high level APIs

It is a complex and difficult to use; makes user to process data using low level APIs. Which necessitates lots of hand coding for each and every task

10

Replication

It does not implement replication

It implement replication, default factor is 3

11

Recovery

It allows recovery of partitions on failed nodes by re-computation of the DAG. It also support check pointing style of Hadoop MapReduce

It is resilient to system faults and failures, resulting in to very high fault tolerant system

12

Queries handled

It can handle both real time as well as historical queries

It can handle only historical queries

13

Ease

It is very easy to program because it does not require any abstractions

It is very difficult to program because it requires abstractions

14

Caching

It caches its partial or intermediate results across its memory of distributed workers

It caches its partial or intermediate results across its disk of distributed workers

15

Security

It is still in its infancy

It has more security features

16

Data modification

It allows user to modify the data in real time through streaming

It allows user to process batch of stored data.

Table 10 Traditional stream versus spark streaming processing

S. no. Characteristics

Tradition stream processing

Spark stream processing

1

Processing

One record at a time

Discretizes data into small chunks

2

Load Balancing

Static

Dynamic

3

Failure recovery

Slower

Faster

4

Unification of streaming, batch and interactive workloads

Not possible

Possible

5

Advanced analytics

Not possible

Possible

6

Latency

High

Low

7

Performance

Low throughput

High throughput

123

S. V. Limkar, R. K. Jha

(b) Parallel tree construction on Apache Spark on disk computation

a6 a7

It consist of eight steps that are as follows.

a5 a8

a4

a3 a1

a2

a9

Fig. 12 Finding intersected rectangles i.e. MBRs of two tree

We used three different parallel architectures for parallel construction of R tree and its variants. Flow of the proposed pseudocode is given in Fig. 13. Detailed explanation of each architecture is as follows. (a) Parallel tree construction on MapReduce framework It consist of eight steps that are as follows. i. First all encrypted data is stored on to the Hadoop distributed file system (HDFS). ii. Whole encrypted data is partitioned into equal size of chunks using data oriented partitioning technique. iii. Workers i.e. mappers are assigned to each and every chunk. iv. Each mapper first decrypt the received encrypted data (only Lat, Long and Alt). After decrypting each mapper constructs R tree and its variants on received data using an existing algorithm given in [57–59] and stores its results into HDFS. Here mapper constructs three different tree, i.e. R tree, R+ tree and R* tree. v. Again, workers i.e. reducers are assigned. vi. In this step, each reducer will take the input from exactly two mappers and combines these two input trees generated by mappers, by using our tree merging algorithm. Results are stored in HDFS. If reducer gets two R tree constructed by two mappers then it reduces into one R tree. Same for R+ tree and R* tree. vii. If still we don’t get the final tree then earlier steps i.e. V followed by VI is repeated, till we get final tree. viii. Final tree is stored into HDFS. The complete flow of the proposed algorithm for the merging of two R tree and its variants using MapReduce is shown in Fig. 14.

123

i. First all encrypted data is stored on to the Hadoop distributed file system (HDFS). ii. Whole encrypted data is partitioned into a series of very small, deterministic batches of equal size called as RDD, using data oriented partitioning technique. Each batch of data is treated as RDDs in spark and processes them using RDD operations. iii. Workers i.e. mappers are assigned to each RDDs. iv. Each mapper first decrypt the received encrypted data (only Lat, Long and Alt). After decrypting each mapper constructs R tree and its variants on received data using an existing algorithm given in [57–59] and stores its results into HDFS. Here mapper constructs three different tree, i.e. R tree, R+ tree and R* tree. v. Again, workers i.e. reducers are assigned. vi. In this step, each reducer will take the input from exactly two mappers and combines these two input trees generated by mappers, by using our tree merging algorithm. Results are stored in HDFS. If reducer gets two R tree constructed by two mappers then it reduces into one R tree. Same for R+ tree and R* tree. vii. If still we don’t get the final tree, then earlier steps i.e. V followed by VI is repeated, till we get final tree. viii. Final tree is stored into HDFS. The complete flow of the proposed algorithm for the merging of two R tree and its variants using Apache Spark on disk computation is similar to Fig. 15 except data partitioning steps i.e data is partitioned in to RDD’s. In Apache Spark on-disk computation intermediate and final results are stored onto distributed shared memory which is slower then Random Access Memory (RAM) and faster than Disk. (c) Parallel tree construction on Apache Spark on disk computation It consist of eight steps that are as follows. i. First all encrypted data is stored on to the Hadoop distributed file system (HDFS). ii. Whole data is partitioned into a series of very small, deterministic batches of equal size called as RDD, using data oriented partitioning technique. Each batch of data is treated as RDDs in spark and processes them using RDD operations. iii. Workers i.e. mappers are assigned to each and every RDDs.

Computing over encrypted spatial data generated by IoT

Start Received data partitioned into equal size of chunk’s and RDD’s If data is partitioned into chunks then call MapReduce else call Apache Spark in-memory and Apache Spark On-Disk computation Workers i.e mappers are assigned to each chunk’s and RDD’s Each mapper construct R Tree and its variants after decrypting ( Only latitude, longitude, altitude) the received encrypted data using existing standard algorithm Workers i.e reducers are assigned to merge the trees Find pair of leaf nodes MBR’s, who are intersecting with each other using plane sweep algorithm

Paired rectangles

N

n>M Y Call Condense / split algorithm of R Tree and its variants

Unpaired rectangles

Validate m<=n<=M Y Merge and form a new MBR

Build R Tree and its variants using bottom up approach, while merging MBR’s check condition m<=M at each level of tree N

K>=0 Y

Insert K unpaired MBR’s using standard insertion algorithm of R Trees and its variants. K--

Final R Tree and Its Variants

End Fig. 13 Flow of proposed work

123

S. V. Limkar, R. K. Jha

Input Data 0 00 0 0 00 0 0 00 0 00 0 0 00 0 0 00 0 00 0 0 00 0 0 00 0 00 0 0 00 0 0 00 0 00 0 0 00 0 0 00 0 00 0 0 00 0 0 00 0 0 00 0 00 0 0 00 0 0 00 0 00 0 0 00 0 0 00 0 00 0 0 00 0 0 00 0 00 0 0 00 0 0 00 0 00 0 0 00 0 0 00 0 0 00 0 00 0 0 00 0 0 00 0 00 0 0 00 0 0 00 0 00 0 0 00 0 0 00 0 00 0 0 00 0 0 00 0 00 0

1

Hadoop Distributed File System (HDFS) 2

Chunk-1

Chunk-2

0 00 0 0 00 0 0 0 0 00 0 0 00 0 0 0 0 00 0 0 00 0 0 0

Chunk-3

0 00 0 0 00 0 0 0 0 00 0 0 00 0 0 0 0 00 0 0 00 0 0 0

Mapper-1

Chunk-4

0 00 0 0 00 0 0 0 0 00 0 0 00 0 0 0 0 00 0 0 00 0 0 0

Mapper-2

Mapper-3

0 00 0 0 00 0 0 0 0 00 0 0 00 0 0 0 0 00 0 0 00 0 0 0

Workers (Mappers) are assigned

4

Each mapper generates R tree and its variant's and stores its result in to HDFS

5

Workers (Reducers) are assigned

6

Each reducer merges R tree and its variant's and stores its result in to HDFS

Mapper-4

Reducer-2

Received Data is paroned in to number of equal size called as Chunks

3

Distributed File System (HDFS)

Reducer-1

Encrypted Input Data is passed to HDFS

Distributed File System (HDFS)

Reducer-3

7

Workers (Reducers) are assigned

Distributed File System (HDFS)

8

Fig. 14 Flow of algorithm for the merging of two R tree and its variants using MapReduce

123

Each reducer merges R tree and its variant's and stores its result in to HDFS

Computing over encrypted spatial data generated by IoT

Distributed File System (HDFS) 0 00 0 0 00 0 0 00 0 00 0 0 00 0 0 00 0 00 0 0 00 0 0 00 0 00 0 0 00 0 0 00 0 00 0 0 00 0 0 00 0 00 0 0 00 0 0 00 0 0 00 0 00 0 0 00 0 0 00 0 00 0 0 00 0 0 00 0 00 0 0 00 0 0 00 0 00 0 0 00 0 0 00 0 00 0

1

0 00 0 0 00 0 0 00 0 00 0 0 00 0 0 00 0 00 0 0 00 0 0 00 0 00 0 0 00 0 0 00 0 00 0 0 00 0 0 00 0 00 0

RDD-1

RDD-2

0 00 0 0 00 0 0 0 0 00 0 0 00 0 0 0 0 00 0 0 00 0 0 0

RDD-3

0 00 0 0 00 0 0 0 0 00 0 0 00 0 0 0 0 00 0 0 00 0 0 0

Mapper-1

RDD-4

0 00 0 0 00 0 0 0 0 00 0 0 00 0 0 0 0 00 0 0 00 0 0 0

Mapper-2

Mapper-3

0 00 0 0 00 0 0 0 0 00 0 0 00 0 0 0 0 00 0 0 00 0 0 0

2

Workers (Mappers) are assigned

4

Each mapper generates R tree and its variant's and stores its result in to memory

5

Workers (Reducers) are assigned

6

Each reducer merges R tree and its variant's and stores its result in to memory

Mapper-4

Reducer-2

Persistent Memory

Reducer-3

Received Data is paroned in to number of equal size called as RDD

3

Persistent Memory

Reducer-1

Encrypted Data is passed from HDFS

7

Persistent Memory 8

Workers (Reducers) are assigned

Each reducer merges R tree and its variant's and stores its result in to memory

Fig. 15 Flow of algorithm for the merging of two R trees and its variants using Apache Spark in-memory computation

123

S. V. Limkar, R. K. Jha

iv. Each mapper first decrypt the received encrypted data (only Lat, Long and Alt). After decrypting, each mapper constructs R tree and its variants on received data using an existing algorithm given in [57–59] and stores its results into memory. Here mapper construct three different tree i.e. R tree, R+ tree and R* tree. v. Again, workers i.e. reducers are assigned. vi. In this step, each reducer will take the input from exactly two mappers and combines these two input trees generated by mappers, by using our tree merging algorithm. Results are stored in HDFS. If reducer gets two R tree constructed by two mappers then it reduces in to one R tree. Same for R+ tree and R* tree. vii. If still we don’t get the final tree, then earlier steps i.e. V followed by VI is repeated, till we get final tree. viii. Final tree is stored inside the memory. The complete flow of the proposed algorithm for the merging of two R tree and its variants using Apache Spark in-memory computation is shown in Fig. 15.

6 Experimental setup In this section we discuss the experimental setup which comprises the details of the hardware specifications, Dataset, workload. Afterward, we present the results and discussions.

6.1 Hardware specifications We conducted our experiments on Dell PowerEdge T430 Tower Server 1 x Intel Xeon E5-2609v4 (1.7GHz/8C/20MB/ 85W/1866Mhz), 8 x 8GB DDR4 RAM, 64 bit SUSE Linux Enterprise 12 SP2 as a base operating system.

All results are based on this dataset.

6.3 Workload (1) MapReduce Framework (2) Apache Spark Framework

6.4 Cryptographic setting Data owner i.e. ADS-B receiver is having inbuilt shared public key of 512-bit through which it encrypts the received data and transmit it through unsecured channel to IoTZetta server whereas data user i.e. end user who uses this data is working on standard existing security i.e. HTTPS.

6.5 Software required 1. SUSE Linux Enterprise 12 SP2 as a base operating system. 2. VMware Workstation 12.5.6 Player for Linux 64-bit Operating Systems [124] 3. Cloudera CDH 5.8 [125]: It is the fastest, easiest, and highly secured Hadoop platform. It is an end to end application to manage and deploy Hadoop clusters. It comes with a stack of services such as Hbase, Hive, Sqoop, etc., inside its clusters. It uses REST APIs. It allows us deploy and manages Hadoop clusters and it services programmatically. The main advantage is that it saves a considerable amount of time and efforts if we have to do the same things like deploying a cluster. A cluster consist of three nodes, i.e. one master and two slaves. The master node runs most of the services and therefore requires higher memory. It is shown in Fig. 16.

6.2 Dataset We used two different sources of data sets. 6.2.1 ADS-B receiver [123] We used GNS 5890 ADS-B Receiver USB-Stick which works on ADS-B 1090 MHz. It is capable to capture the real-time aircraft’s position within its range of around 300 Km, but it can capture only few aircraft locations. Since the number of aircraft captured by these receivers are very few as compared to flightradar24. 6.2.2 Flightradar24 [56] It can capture thousands of aircraft’s real-time position, each aircrafts location is updated twice in every second. We considered this dataset.

123

Hadoop 1 (Master) Name Node Task Tracker Cloudera Manager 3-CPU, 16 GB RAM IP Address: 192.168.1.1

Hadoop 2 (Slave) Secondary Name Node Data Node Task Node 1-CPU, 8 GB RAM IP Address: 192.168.1.2

Hadoop 3 (Slave) Data Node Task Node 1-CPU, 8 GB RAM IP Address: 192.168.1.3

Hadoop 4 (Slave) Data Node Task Node 1-CPU, 8 GB RAM IP Address: 192.168.1.4

Fig. 16 Cloudera cluster

Computing over encrypted spatial data generated by IoT

7 Threat model

8 Security analysis

In this paper we focused on IoT application, where data is collected from sensor and stored on third party cloud. We considered three parties involved in it: Cloud, Data owner, Data user and an Identity Provider (IDP) which is present inside the zetta server to certify the public key of each user.

8.1 Security analysis and attack model Definition 1 if there exist a plain text Pt and cipher text Ct. Cryptosystem said to be perfectly secrete only if, for any g ∈ Pt and any h ∈ Ct, PR (pt = pt | ct = Ct) = PR (pt = pt).

7.1 Threats We considered the cloud is honest but at the same time it is curious i.e. it will follow the rules correctly, along with this it tries to learn from the stored data. This is a standard model where violation of rules results in to penalizing the service provider. However, the internal adversary always zealous to passively analyze the data without being detected. Few chances cannot be ignored, where external adversary can also gain access to the encrypted data if system security gets compromised. We consider cloud-side threats which are due to curious administrator and threats while data is transferred from edge devices to cloud through unsecured communication media.

7.2 Assumptions Along with the presence of the honest but curious cloud, our proposed system assumes that the Identity Provider (IDP) properly validates users’ or devices identity key pairs. The IDP is the basic requirement in multi-party systems, which is also a trustworthy for both the involved parties i.e. external and an internal party. Moreover, proposed system assumes that the system work properly and never ever share user keys to malicious parties or third parties. Finally, we assume standard state of the art security mechanisms is present for IoT device security and communication amongst all the party is over secure channel.

7.3 Guarantees Our proposed system guards the privacy of IoT device generated data, which is stored in the cloud. However, if IoT device gets compromised and group keys are disclosed, only data of the compromised device and the affected group are disclosed. Proposed system is also secure against, Eavesdropping, Man in the middle attack (MitMA), Data interception and theft, because the data which is transferred from edge devices to cloud is in the form of cipher text. Even if this data gets available to the adversary, it can’t be decrypted without the proper keys. Our proposed system cryptographically avoids unauthorized data access

Attack model Known ciphertext only attacks is considered for analyzing the security of our proposed scheme. In this model, the IoTZetta cloud server is only holding encrypted dataset, which is outsourced bye the data owner. In this type of attacks, attacker has the cipher text of several messages, all of which have been encrypted with the encryption algorithm. Attacker’s job is to recover the plain text of as many messages as possible or better yet to deduce the key used to encrypt the messages, in order to decrypt other messages encrypted with the same keys. If we received ciphertext value = 0, then plaintext pt = 0. Because of this, all the times pt = 0 is encrypted as non-zero element in X d n . Theorem 1 The proposed privacy model maintains perfect secrecy of the IoT generated data. Proof Let {γ1 , . . . . . . γn } be a basis of X d n over X d . For any cipher text ct ∈ X d∗n , there exist j ∈ {1 . . . n} such as T1n (γ j ct) = 0. For any plain text pt ∈ X d and any a j ∈ X d , i ∈ {1..n}\{ j}, define a j = pt −

i∈{1...n}{ j}

ai T1n (γi ct) (T1n (γ j ct))−1

n n n n Then, we have i=1

an i T1 (γi ct) = pt,n i.e., T1 ( i=1 ai γi ct) = pt. Define β = i=1 ai γi , then T1 (βct) = pt. For pt = X d∗ , there are d n−1 possible β ∈ X d∗n such that T1n (βct) = pt. If pt = 0 and ai = 0 for i ∈ {1 . . . n}\{ j}, then a j = 0, which leads to β = 0. So, for pt=0, there are only d n−1 − 1 possible β Fd∗n such that T1n (βct) = m. Note: We assumed pt = 0 is encrypted as a nonzero element in X d n . Hence, for any pt ∈ X d and ct ∈ X d∗n . As root ct ∈ X d∗n is arbitrarily selected, then Ar(ct = ct|pt = pt) =

d n−1 1 . 1 = d n−1 , if pt ∈ X d∗ , d n−1 d n−1 n−1 d −1 1 . 1 = d n−1 , if pt − 0. d n−1 d n−1 −1

As for any m ∈ X d , PR(pt = pt) = 1/d, for any cipher text ct ∈ X d∗n , PR(ct = ct) = Ar(pt = pt) PR ct = ct|pt = pt) pt∈X d

1 = n . d −1

123

S. V. Limkar, R. K. Jha

With the help of Bayes’ theorem, for any pt ∈ X d and any ct ∈ X d∗n , PR(pt = pt|ct = ct) PR(ct = ct|pt = pt) PR(pt = pt) 1 = = = PR(pt = pt) PR(ct = ct) d Therefore, from the definition of partial homomorphic encryption, our proposed technique maintains perfect secrecy. Where, P R = Conditional Probability X d = finite field, d = Power of a prime, T1n (.) = Trace function required for a linear transformation from X d n onto X d .

8.2 Load balancing The performance of parallel computing is totally based on the quality of partition. If the data is not partitioned properly, load imbalance arises. To achieve good load balancing there is a highly necessity of implementing good partition strategy. In the proposed work, the data oriented partitioning method is used to decide equal partition cut and then with the proposition 1 balancing of load can be achieved. Proposition 1 If the received geospatial data A is suitably huge, suitable amount of geospatial data will be allotted to each mapper once the data is partitioned. Proof “A is suitably huge” it means the data which we received after partitioning is adequate for consideration of parallel distribution across the different mapper. The partitioning algorithm partition the received geospatial data into M mappers. Let Mi be i-th mapper. Then M denotes the number of partitioned geospatial data in the mapper and can be shown as follows: A |Mi | = M

(10)

If case where geospatial data distribution is defined properly by the generated geospatial dataset D. M represents the total number of geospatial dataset, can be represented as follows: ||Mi || = |Mi |

|D| |D| = A M

(11)

8.3 Speedup The important reason behind using parallelization of a job is to run the job faster i.e. to speedup. Speedup in parallel

123

program can be assessed with the support of Amdahl law [126]. Next we examine the speedup of the algorithm. The speedup can be represented in terms of following equation as follows: Speedup (S) =

Ts Tp

(12)

where Ts denotes the total time required for sequential execution of the job, Tp denotes the total time required for execution of parallel jobs. In our algorithm, the sequential execution time Ts can be represented as follows: Ts = TPart + TEncr

(13)

Where TPart represents the time required for partitioning the received data and TEncr represents the time requires for encryption of the received data. The parallel execution time Tp can be represented as: Tp =

Tmerg Ttree Tintr Ttree Tdecr + + + + + Tover M M r r r

(14)

where Tdecr is the time required for decrypting only latitude, longitude and altitude, Ttree is the time required to construct R tree and its variants, Tintr represents the time required for finding the pair of intersected rectangles, Tmerg is the time required for merging the two rectangles to form a larger rectangle. Tover is the overhead caused by parallel programing which consist of the cumulative time spent on communication, and synchronization, M indicates number of mappers concurrently working on data and r indicates number of reducers concurrently working on data. The speedup can be represented by combining the Eqs. (13) and (14), as follows: S=

TPart + TEncr Tdecr M

+

Ttree M

+

Tintr r

+

Tmerg r

+

Ttree r

+ Tover

(15)

In MapReduce framework, every time the data are read from the disk and the result is stored back to the Hadoop Distributed File System (HDFS) after a specific iteration and again data gets read from the HDFS for the next iteration. This whole process involved multiple read/write operations, which consume a lot time, resulting in too high latency. Whereas in case of Apache Spark in memory computation, data is read from the memory, resulting in reduced latency compared to MapReduce framework. As we know parallel algorithm may exhibit dissimilar performance on different parallel architecture. That is applicable to MapReduce framework and Apache Spark architecture.

Computing over encrypted spatial data generated by IoT

8.4 Scalability Scalability is a measure of a parallel system’s capacity to increase speedup in proportion to the number of processors. Parallel system is said to be scalable if its efficiency is increasing by increasing the number of processor and problem size at the same time otherwise it is called as non-scalable parallel system. X=

Sequential execution time with data size D Parallel execution time with problem size D

(16)

Equation (16) is similar to the definition of the speedup which is given in Eq. (12). In speedup problem size is fixed, whereas in scalability, speed is fixed.

8.5 Efficiency The efficiency is defined as the ratio of speedup to the number of processors. Efficiency measures the fraction of time for which a processor is usefully utilized. E=

Ts S = P pT p

(17)

where S is the speedup, P is the number of processors.

8.6 Quality of generated R-tree and its variants on parallel architecture For enhancing the search performance [84] there is high necessity to have a minimal overlap and a minimal area because these factors are highly responsible for path pruning in R trees and its variants. In our case, all MBR’s are in three dimensions and area of cuboid is calculated in terms of volume. Total volume of leaf nodes is considered as the most crucial factor for the overall performance of R-tree and its variants in case of bottom up method. We can calculate the volume and overlap parameters for newly constructed R tree and its variants on parallel architecture given by following: Volume (R) =

r

Volume(Ri .M B R)

i=1 n

Volume (LN) =

i=1 r

Volume (LNi .M B R)

(18)

(19)

and its variants generated by each individual mappers (the number of children) of R, Ri is the i-th child node of R. Volume (LN) is the total area of leaf nodes, n is the number of leaf node, LN iis the i-th leaf node. All above parameters are calculated by considering spatial objects are in three dimensions, i.e. latitude, longitude and altitude. The quality attributes of constructed R tree and its variants over encrypted data and normal data on MapReduce and Apache Spark architecture by considering various numbers of partitions, number of reducers and sequential process (SP) which works on a single CPU are interpreted in Tables 11, 12, 13 and 14. Where D represents on disk computation, M = represents in memory computation and SP represents sequential processing on single CPU.

Table 11 Notations and its meaning Notations

Meaning

a, b

Secrete key pair

d, f

Public key pair

t

Block size used for the message

X d∗2

Additive subgroup of integers modulo d 2

p, Pt

Plain text

E

Encryption

Ct

Cipher Text

Xd

Finite field, where d is power of prime

T1n

Trace function

PR

Conditional Probability

x

Latitude

y

Longitude

z

Altitude

M

Mapper

A

Data recevied after partitioning

Mi

i-th mapper

r

Reducer

A

Geospatial partitioned data

D

Geospatial dataset

TPart

Time required for partitioning received data

TEncr

Time requires for encrypting received data.

Ts

Time required for sequential execution

Tp

Time required for parallel execution

Ttree

Time required to construct R tree and its variants

Tintr

Time required for finding the pair of intersected rectangles

(20)

Tmerg

Time required for merging the two rectangles to form a larger rectangle.

where R indicates the root, volume (R) represents the total area of the root node, r is the total number of sub R-trees

Tover

Overhead time caused by parallel programing

Overlap(R) =

r

Volume(Ri .M B R ∩ R j .M B R)

i=1 j=i+1

123

123

–

SP

–

19

15

11

7

3

Number of reducers

5

5

5

5

5

5

5

5

5

5

5

5

12,142

12,175

12,160

12,152

12,148

12,144

12,142

12,175

12,160

12,152

12,148

12,144

Encrypted

Normal

Normal

Encrypted

Leaf Nodes

Height of the tree

Number of mappers

4

8

12

16

20

–

Partitions

4

8

12

16

20

SP

–

19

15

11

7

3

Number of reducers

5

5

5

5

5

5

5

5

5

5

5

5

12740

12784

12768

12760

12755

12751

12740

12784

12768

12760

12755

12751

Encrypted

Normal

Normal

Encrypted

Leaf Nodes

Height of the tree

Table 13 Quality parameters of the constructed R+ tree over encrypted and normal data on Apache Spark (D)

16

12

12

20

8

8

16

4

4

20

Number of mappers

Partitions

Table 12 Quality parameters of the constructed R tree over encrypted and normal data on MapReduce

6.89E+16

6.107E+16

6.102E+16

6.99E+16

6.94E+16

6.89E+16

Normal

5.35E+16

5.58E+16

5.53E+16

5.48E+16

5.45E+16

5.40E+16

Encrypted

6.89E+16

6.107E+16

6.102E+16

6.99E+16

6.94E+16

6.89E+16

Encrypted

Root Node area

5.35E+16

5.58E+16

5.53E+16

5.48E+16

5.45E+16

5.40E+16

Normal

Root Node area

5.75E+21

5.98E+21

5.94E+21

5.88E+21

5.83E+21

5.77E+21

Encrypted

6.40E+21

6.62E+21

6.55E+21

6.49E+21

6.46E+21

6.41E+21

Normal

6.40E+21

6.62E+21

6.55E+21

6.49E+21

6.46E+21

6.41E+21

Encrypted

Leaf node area

5.75E+21

5.98E+21

5.94E+21

5.88E+21

5.83E+21

5.77E+21

Normal

Leaf node area

3.62E+18

3.79E+18

3.74E+18

3.69E+18

3.66E+18

3.62E+18

Normal

Overlap

8.01E+18

8.27E+18

8.17E+18

8.12E+18

8.08E+18

8.02E+18

Normal

Overlap

3.62E+18

3.79E+18

3.74E+18

3.69E+18

3.66E+18

3.62E+18

Encrypted

8.01E+18

8.27E+18

8.17E+18

8.12E+18

8.08E+18

8.02E+18

Encrypted

S. V. Limkar, R. K. Jha

3.39E+19 3.39E+19 1.09E+22 1.01E+17 1.01E+17

1.09E+22

3.68E+19

3.60E+19 3.60E+19

3.68E+19 1.33E+22

1.27E+22 1.27E+22

1.33E+22

1.21E+17

1.27E+17

1.21E+17

1.27E+17

3.44E+19

3.51E+19 3.51E+19 1.23E+22 1.13E+17 1.13E+17

1.23E+22

3.40E+19 3.40E+19

3.44E+19 1.16E+22 1.16E+22 1.08E+17 1.08E+17

1.09E+22 1.03E+17 1.03E+17

1.09E+22

The results obtained through the simulated environment shows that the proposed framework has achieved a security of the IoT generated spatial data while it is in transit through unsecured communication medium and when it is stored on public cloud. We performed various operation on encrypted data such as indexing it in to R tree and its variants, parallel merging of two encrypted trees on three different parallel architecture, i.e. MapReduce framework, Apache Spark on disk computation and Apache Spark in-memory computation, inserting new spatial objects and shooting range query on encrypted data etc. We compared operation performed on encrypted spatial data and non-encrypted spatial data. As we know each node of R tree and its variants contains at least m and at most M entries, where m ≤ M/2. In the implementation we considered m = 3 and M = 10. Dataset of 10,000 flights are used. Following parameters are used to evaluate the performance.

12,368 5 5

12,368

13,423

13,406 13,406

13,423

5

5

5

5

We considered geospatial data is of three dimensional, i.e. latitude, longitude and altitude, therefore the resulting leaf node MBRs are is of cuboid type and its volume is measured in cubic centimeters. This volume is same for all the trees i.e. tree constructed on sequential architecture and all three parallel architectures which is shown in Figs. 17, 18, 19, 20.

– – SP

15

19

16

Fig. 17 Leaf node MBR’s area of sequential R trees and its variants constructed on encrypted data and normal data

20

16

13,398 11 12

12

13,398 5 5

13,393

13,389

5 5 7

13,393

5 5 3

8 8

4

13,389

Normal Encrypted Normal

Number of reducers

4

20

Normal Encrypted Normal Encrypted Normal

Leaf nodes Height of the tree

Encrypted

Leaf node area Root Node area

9 Result

9.1 Volume

Number of mappers Partitions

Table 14 Quality parameters of the constructed R* tree over encrypted and normal data on Apache Spark (M)

Overlap

Encrypted

Computing over encrypted spatial data generated by IoT

Fig. 18 Leaf node MBR’s area in MapReduce framework, for R trees and its variants constructed on encrypted data and normal data

123

S. V. Limkar, R. K. Jha

Fig. 19 Leaf node MBR’s in Apache Spark (on-disk computation), for R trees and its variants constructed on encrypted data and normal data

Fig. 23 CPU Utilization for indexing encrypted data and normal data in R∗ tree on sequential and parallel frameworks

9.3 Execution time

Fig. 20 Leaf node MBR’s area in Apache Spark (in-memory computation), for R trees and its variants constructed on encrypted data and normal data

Execution time is the time required for creating an index of all the trees. It calculates as the difference between start and end time required for creating the index of all the trees i.e. trees constructed on traditional sequential architecture and all three parallel architectures is shown in Figs. 24, 25 and 26.

9.2 CPU utilization CPU utilization is one of the vital parameter. The results shown the maximum CPU utilization for generating the index in all the trees, i.e. trees constructed on traditional sequential architecture and all three parallel architectures is shown in Figs. 21, 22 and 23.

Fig. 24 Execution time for indexing encrypted data and normal data in R tree using sequential and parallel frameworks

Fig. 21 CPU Utilization for indexing encrypted data and normal data in R tree using sequential and parallel frameworks

Fig. 25 Execution time for indexing encrypted data and normal data in R+ tree using sequential and parallel frameworks

Fig. 22 CPU Utilization for indexing encrypted data and normal data in R+ tree using sequential and parallel frameworks

Fig. 26 Execution time for indexing encrypted data and normal data in R∗ tree using sequential and parallel frameworks

123

Computing over encrypted spatial data generated by IoT

Fig. 27 Execution time for inserting encrypted data and normal data and reconstructing R tree using sequential and parallel frameworks

Fig. 28 Execution time for inserting encrypted data and normal data and reconstructing R+ tree using sequential and parallel frameworks

Fig. 29 Execution time for inserting encrypted data and normal data and reconstructing R∗ tree using sequential and parallel frameworks

Fig. 30 Execution time for updating encrypted data and normal data and reconstructing R tree using sequential and parallel frameworks

Fig. 31 Execution time for updating encrypted data and normal data and reconstructing R+ tree using sequential and parallel frameworks

Fig. 32 Execution time for updating encrypted data and normal data and reconstructing R* tree using sequential and parallel frameworks

9.6 Memory utilization 9.4 Execution time for inserting data Execution time for inserting data is the average time required for inserting 1000 aircrafts data and reconstructing all the trees i.e. trees constructed on traditional sequential architecture and all three parallel architectures is shown in Figs. 27, 28 and 29.

Memory utilization is one of the important factor. The results shown the maximum memory utilization for generating the index of all the trees i.e. trees constructed on traditional sequential architecture and all three parallel architectures is shown in Figs. 33, 34 and 35.

9.5 Execution time for updating data In the proposed framework position of the aircraft is not fixed at all the time rather it is continuously changing. We randomly deleted 1000 aircrafts data and inserted 1000 aircrafts data and calculated the execution time required for the reconstruction of all the trees i.e. trees constructed on traditional sequential architecture and all three parallel architectures is shown in Figs. 30, 31 and 32.

Fig. 33 Memory utilization for indexing the encrypted data and normal data in R tree using sequential and parallel frameworks

123

S. V. Limkar, R. K. Jha

Fig. 34 Memory utilization for indexing the encrypted data and normal data in R+ tree using sequential and parallel frameworks

Fig. 35 Memory utilization for indexing the encrypted data and normal data in R∗ tree using sequential and parallel frameworks

Fig. 38 Overlap area in Apache Spark on-disk computation framework, for R trees and its variants on encrypted data and normal data

Fig. 39 Overlap area in Apache Spark in-memory computation framework, for construction of R trees and its variants on encrypted data and normal data

9.8 Search time 9.7 Overlap This is the addition of all overlap area in cubic centimeters. As in our case we have used three dimensional MBR (Latitude, Longitude, and Altitude) i.e. cuboid. Its unit is cubic centimeters. Overlap of all the trees shown in in Figs. 36, 37, 38 and 39.

This is the average time required to execute spatial range query i.e. find all objects A where DISTANCE (Query, A) ≤ Dist. Example, find all the aircrafts that are within the vicinity of 150 km from Pune International Airport location. We executed this spatial range query and evaluated its performance on various trees and is shown in Figs. 40, 41 and 42.

Fig. 36 Overlap area in sequential R trees and its variants on encrypted data and normal data

Fig. 40 Execution time for spatial range query on R tree using sequential and parallel architectures on encrypted data and normal data

Fig. 37 Overlap area in MapReduce framework, for R trees and its variants on encrypted data and normal data

Fig. 41 Execution time for spatial range query on R+ tree using sequential and parallel architectures on encrypted data and normal data

123

Computing over encrypted spatial data generated by IoT

Fig. 42 Execution time for spatial range query on R∗ tree using sequential and parallel architectures on encrypted data and normal data

Table 15 give the comparative analysis of various parameter on sequential and parallel approach.

10 Conclusion and future scope Storing data on the cloud leads to compromising the security and privacy because it is very easy for an inside attacker (e.g. cloud administrator) to compromise the security. Efficient index generation and improved quality of the generated index are the two major important factor that need to be considered before firing spatial query over encrypted spatial data. For indexing encrypted geospatial data we used R tree and its variants, because these data structure are emerged as practically most efficient indexing methods, widely recognized and adopted in geospatial data management. In our proposed algorithm Mappers of both the parallel

framework i.e. MapReduce and Apache Spark uses the existing algorithm of R tree and its variants for generating index on received partitioned data in parallel, whereas reducer of both the parallel framework uses our proposed tree merge algorithm for merging the input received from exactly two mappers. We generated index on three different parallel framework i.e. MapReduce, Apache Spark on-disk computation and Apache Spark in-memory computation along with sequential processing. The result guarantees two things, first one is that, the generated R tree and its variants build by using our methods underneath parallel architectures exhibits the same features as built on sequential processing and second one is that, the proposed framework has maintained perfect secrecy while IoT generated spatial data is in transit from edge devices to cloud through unsecured communication medium and when it is stored on the cloud. From the result it’s clear that Apache Spark in-memory computation build index much faster than all the existing methods also Apache Spark in-memory computation execute spatial query over encrypted data much faster than all the existing methods. Specifically result of the spatial range query execution time over encrypted spatial data of our proposed scheme is extremely efficient which takes slightly more time as taken by normal spatial range query executed over non-encrypted spatial data in sequential processor. Our scheme is not only efficient, but also highly compatible and scalable with IoT generated spatial data.

123

123

Search time in seconds

Overlap area in cubic centimeter

389746 170547

11 0.049 0.041

R* tree

1100

R* tree

3.E+18

R* tree

0.05 0.059

R + tree 0.035 R* tree

0.031

0.62

0.38

3.E+18

4.E+18

R tree

3.E+19

3.E+19

R + tree 4.E+18

R tree

1900

200

1800

200

R + tree 200

0.031

R + tree 0.037

0.37

0.75

R* tree

10

25

R tree

49

36

153264

R + tree 13

R tree

65766

tree 89541

R* tree

R+

789905

52.16

R* tree R tree

48.25

48.22

55.30

45.85

R + tree 44.16

R tree

5.77E+21

R* tree

Memory utilization for indexing the data in kilobytes R tree

Execution time for updating data in seconds

Execution time for inserting data in seconds

Execution time required for indexing in seconds

% CPU utilization for indexing

6.41E+21

5.77E+21

1.09E+22

1.09E+22

R tree

Leaf node MBR’s area in cubic centimeter

Normal data Encrypted data

Existing sequential approach

R + tree 6.41E+21

Tree

Parameters

Table 15 Comparative result analysis of various parameter on sequential and our parallel approach

Spark on-disk

Spark in-memory

0.019

0.02

0.21

3.E+18

4.E+18

3.E+19

1200

1200

1200

0.08

0.021

0.21

7

7

26

37648

46611

220177

26.42

25.89

20.11

5.77E+21

6.41E+21

1.09E+22

0.03

0.035

0.39

3.E+18

4.E+18

3.E+19

4500

3200

2100

0.021

0.038

0.39

10

12

35

92062

97362

426831

31.25

32.79

24.41

5.77E+21

6.41E+21

1.09E+22

0.01

0.015

0.15

3.E+18

4.E+18

3.E+19

1800

1600

1300

0.01

0.012

0.18

3

4

18

25170

35467

165081

17.67

16.93

10.08

5.77E+21

6.41E+21

1.09E+22

0.018

0.025

0.30

3.E+18

4.E+18

3.E+19

4800

4400

2800

0.012

0.025

0.25

5

10

25

71368

76532

377199

21.75

24.34

15.90

5.77E+21

6.41E+21

1.09E+22

0.001

0.001

0.08

3.E+18

4.E+18

3.E+19

2000

1900

1800

0.001

0.001

0.6

1

2

3

11793

22734

82541

4.42

4.23

2.52

5.77E+21

6.41E+21

1.09E+22

0.002

0.002

0.10

3.E+18

4.E+18

3.E+19

7200

4800

4000

0.002

0.002

0.16

3

5

5

34435

41094

153364

7.91

5.09

7.33

5.77E+21

6.41E+21

1.09E+22

Normal data Encrypted data Normal data Encrypted data Normal data Encrypted data

MapReduce

Our parallel approach

S. V. Limkar, R. K. Jha

Computing over encrypted spatial data generated by IoT Acknowledgements The authors gratefully acknowledge the support provided by 5G and IoT Lab, DoECE, and TBIC-Shri Mata Vaishno Devi University, Katra, Jammu. The authors would also like to thank the anonymous reviewers for their constructive comments and suggestions to improve the quality of the paper.

Appendix After applying homomorphic encryption to the Table 8, resulting encrypted values are shown in below Table 16.

Compliance with ethical standards Conflict of interest The authors declare that there is no conflict of interests regarding the publication of this paper.

Table 16 Applying encryption to Table 8 Aircraft ID

Aircraft source airport

Aircraft destination airport

Aircraft current position

Latitude

Longitude

Latitude

Longitude

Latitude

Longitude

Altitude

1220886865

4251025105

1567333394

2129486626

2519921547

2987917025

1163618781

1910389146

2868937858

9682651704

1435182114

1041861778

7184290325

9516938177

7773015603

4495732329

1316966928

4892636756

2092556997

4104748832

6116938822

8445362362

4826278130

7194732874

0745662142

8289524879

0150244004

3090843299

8460374580

2172920444

8393801741

9046828839

3739428125

5014746323

4788821720

6757045648

9925318482

7015772396

3815702029

0625050836

8760148509

5902076007

4553443152

0066838485

2250033541

6851707862

3672438568

2392748049

6276284981

1797748520

5709309060

3552928135

3582500799

7089754083

7251083377

6696958841

7176735752

2393455953

3355780280

8743382219

3397588776

2293940618

2626369142

8893782374

6216562191

7120736942

0655148063

6549954115

7668974071

6098661878

1908589847

6982760753

8012242975

2874453564

0995787089

5705353719

7666862063

3062613633

5360548230

2263884708

6557340787

1660193403

2389939376

4393714095

3715314246

3158438671

5429767602

7377567651

3781767318

1920275304

0474253812

8676368525

7359944333

8311752195

4692561188

2438838894

3722838783

0692031506

9479246171

8352445010

5619547398

6874868068

7960604525

7466184513

2332524546

8025217349

7102909238

0175455617

1554701929

0950650120

2385104514

4311011320

1505755695

7422469201

4716302597

8015045562

9843494984

9502154994

6752927699

6735645777

4836588405

5006399032

9668271166

4873211800

7399386716

0354752138

2334892151

4381468911

6787915198

4185976485

7379972070

5429500738

2990970242

7213771946

9720543447

8849436368

2013750628

9086019661

7868009556

5623416714

5212781483

1603179167

1788587037

0132464519

4722562281

5878402030

7808202115

4732594586

2895130929

3287217424

0937887446

7323014318

4936027517

8949066326

0146221435

2651549160

2911685771

3250025784

5963467025

6351552813

8789164587

1887331521

5221063394

5688253281

4388879369

0835890586

3972060319

4175711311

3644282213

3199408974

2410548360

3480753196

6380374205

0175711576

2614797438

0229451340

6642795948

6337060215

8891221478

9294781489

1160304531

7982748514

7172717348

6220765000

2078068566

6113362098

0266570205

2681045740

7722937774

9539284026

8303818559

2285466404

6572346853

8681697903

9075978213

1590292486

4568818784

7253372657

3404559356

1430218770

6725634065

6818866082

0502102057

8550190088

2912460363

9924457973

9045544448

1241114893

3355163443

4891290975

9456860694

0858142940

6761441163

9169398002

3047709930

7890760506

9823349064

4317514898

0457844345

1536987469

9919098195

6570932877

4935305821

0732566815

6219436204

2672066993

7061225951

8142814338

9125910730

9324367170

2548615780

0063245885

0966984456

6391451526

5176575156

5939127212

5903856806

5258878299

1557046839

6677247197

85935527

5530031

34006999

2999932

44022247

51350812

8795601

22134506

123

S. V. Limkar, R. K. Jha

References 1. Atzori, L., Iera, A., & Morabito, G. (2010). The internet of things: A survey. Computer Networks, 54(15), 2787–2805. 2. Gubbi, J., Buyya, R., Marusic, S., & Palaniswami, M. (2013). Internet of things (IoT): A vision, architectural elements, and future directions. Future Generation Computer Systems, 29(7), 1645–1660. 3. http://www.gartner.com/newsroom/id/3598917. 4. https://www.idc.com/. 5. Eagle, N., & Greene, K. (2014). Reality mining: Using big data to engineer a better world (1st ed.). Cambridge: The MIT Press. 6. Madden, S. (2012). Going big on spatial data: A mobile systems perspective. In Keynote at 20th ACM SIGSPATIAL international conference on advances in geographic information systems, Redondo Beach, CA. 7. Lee, J. G., & Kang, M. (2015). Geospatial big data: Challenges and opportunities. Journal of Big Data Research, 2(2), 74–81. 8. Dasgupta, A. (2013). Big data: The future is in analytics. http://www.geospatialworld.net/Magazine/MArticleView. aspx?aid=30512, April 2013, Geospatial World. 9. Lapkin, A. (2012). Hype cycle for big data, 2012. http://www. gartner.com/document/2100215, July 2012. 10. http://aws.amazon.com/solutions/case-studies/. 11. Song, D. X., Wagner, D., & Perrig, A. (2000). Practical techniques for searches on encrypted data. In Proceedings of IEEE SP (pp. 44–55). 12. Shahabi, C., Fan, L., Nocera, L., Xiong, L., & Li, M. (2015). Privacy-preserving inference of social relationships from location data: A vision paper. In Proceedings of ACM SIGSPATIAL GIS (pp. 1–4). 13. http://aws.amazon.com/. 14. https://cloud.google.com/. 15. Katz, J., & Lindell, Y. (2007). Introduction to modern cryptography: Principles and protocols. Boca Raton, FL: CRC Press. 16. TechRadarTM : Internet of things security, Q1 2017, A mix of new and existing technologies help secure IoT deployments, January 19, 2017. 17. Song, D., Wagner, D., & Perrig, A. (2000). Practical techniques for searches on encrypted data. In Proceedings of IEEE S&P’00. 18. Curtmola, R., Garay, J., Kamara, S., & Ostrovsky, R. (2006). Searchable symmetric encryption: Improved definitions and efficient constructions. In Proceedings of ACM CCS (pp. 79–88). 19. Kamara, S., Papamanthou, C., & Roeder, T. (2012). Dynamic searchable symmetric encryption. In Proceedings of ACM CCS (pp. 965–976). 20. Cash, D., Jarecki, S., Jutla, C., Krawczyk, H., Ro¸su, M.-C., & Steiner, M. (2013). Highly-scalable searchable symmetric encryption with support for Boolean queries. In Proceedings of CRYPTO (pp. 353–373). 21. Pappas, V., et al. (2014). Blind seer: A scalable private DBMS. In Proceedings of IEEE SP (pp. 359–374). 22. Cash, D., et al. (2014). Dynamic searchable encryption in verylarge databases: Data structures and implementation. In Proceedings of NDSS (pp. 1–16). 23. Stefanov, E., Papamanthou, C., & Shi, E. (2014). Practical dynamic searchable encryption with small leakage. In Proceedings of NDSS. 24. Wong, W. K., Kao, B., Cheung, D. W. L., Li, R., & Yiu, S. M. (2014). Secure query processing with data interoperability in a cloud database environment. In Proceedings of ACM SIGMOD (pp. 1395–1406). 25. Lai, J., Zhou, X., Deng, R. H., Li, Y., & Chen, K. (2013). Expressive search on encrypted data. In Proceedings of ACM ASIA CCS (pp. 243–251).

123

26. Sun, W., et al. (2013). Privacy-preserving multi-keyword text search in the cloud supporting similarity-based ranking. In Proceedings of ACM AISA CCS (pp. 71–82). 27. Ghinita, G., & Rughinis, R. (2014). An efficient privacypreserving system for monitoring mobile users: Making searchable encryption practical. In Proceedings of ACM CODASPY’14. 28. Wang, B., Li, M., Wang, H., & Li, H. (2015). Circular range search on encrypted spatial data. In Proceedings of IEEE CNS’15. 29. Zhu, H., Lu, R., Huang, C., Chen, L., & Li, H. (2015). An efficient privacy-preserving location based services query scheme in outsourced cloud. IEEE Transactions on Vehicular Technology, 65(9), 7729–7739. 30. Wang, B., Li, M., & Wang, H. (2016). Geometric range search on encrypted spatial data. IEEE Transactions on Information Forensics and Security, 11(4), 704–719. 31. Wang, B., Li, M., & Xiong, L. (2018). FastGeo: Efficient geometric range queries on encrypted spatial data. In IEEE transactions on dependable and secure computing (Vol. PP, No. 99, p. 1). 32. de Berg, M., Cheong, O., van Kreveld, M., & Overmars, M. (2008). Computational geometry: Algorithms and applications. New York: Springer. 33. Atallah, M. J., & Du, W. (2001). Secure multi-party computational geometry. In Proceedings of the international workshop on algorithms and data structures (pp. 165–179). 34. Du, W., & Atallah, M. J. (2001). Secure multi-party computation problems and their applications: A review and open problems. In Proceedings of new security paradigms workshop (pp. 13–22). 35. Narayanan, A., Thiagarajan, N., Lakhani, M., Hamburg, M., & Boneh, D. (2011). Location privacy via private proximity testing. In Proceedings of NDSS. 36. Šed-enka, J., & Gasti, P. (2014). Privacy-preserving distance computation and proximity testing on earth, done right. In Proceedings of ACM ASIA CCS (pp. 99–110). 37. Goldreich, O. (2004). Foundations of cryptography: Basic applications (Vol. 2). Cambridge: Cambridge University Press. 38. Azad, M. A., Bag, S., & Hao, F. (2017). M2M-REP: Reputation of machines in the internet of things. In Proceedings of the 12th international conference on availability, reliability and security (ARES ’17). ACM, New York, NY, USA, Article 28, 7. 39. Ajmal, M., Bag, S., Tabassum, S., & Hao, F. (2017). privy: Privacy preserving collaboration across multiple service providers to combat telecoms spam. In IEEE transactions on emerging topics in computing. 40. Azad, M. A., & Bag, S. (2017). Decentralized privacy-aware collaborative filtering of smart spammers in a telecommunication network. In Proceedings of the symposium on applied computing (SAC ’17) (pp. 1711–1717). New York, NY: ACM. 41. Khan, M. A., & Salah, K. (2018). IoT security: Review, blockchain solutions, and open challenges. Future Generation Computer Systems, 82, 395–411. 42. Al-Fuqaha, A., Guizani, M., Mohammadi, M., Aledhari, M., & Ayyash, M. (2015). Internet of things: A survey on enabling technologies, protocols, and applications. In IEEE communications surveys and tutorials (Vol. 17, no. 4, pp. 2347–2376). Fourthquarter. 43. Botta, A., de Donato, W., Persico, V., & Pescapé, A. (2014). On the integration of cloud computing and internet of things. In 2014 International conference on future internet of things and cloud, Barcelona (pp. 23–30). 44. https://xively.com/. 45. https://thingspeak.com/. 46. http://iot-toolkit.com/. 47. http://open.sen.se/. 48. http://www.iot-a.eu/. 49. http://www.nimbits.com/.

Computing over encrypted spatial data generated by IoT 50. http://www.netlabtoolkit.org/learning/tutorials/iot-cloudservices/. 51. http://www.openiot.eu/. 52. http://www.zettajs.org/. 53. https://sites.google.com/site/opensourceiotcloud/. 54. Jiang, L., Da Li, X., Cai, H., Jiang, Z., Fenglin, B., & Boyi, X. (2014). An IoT-oriented data storage framework in cloud computing platform. IEEE Transactions on Industrial Informatics, 10(2), 1443–1451. 55. https://nodejs.org/. 56. http://www.flightradar24.com/. 57. Guttman, A. (1984). R-trees: A dynamic index structure for spatial searching. In Proceedings of ACM SIGMOD international conference on management of data. 58. Sellis, T., Roussopoulos, N., & Faloutsos, C. (1987). The R+-tree: A dynamic index for multidimensional objects. In Proceedings of the 13th international conference on very large databases (VLDB). 59. Beckmann, N., Kriegel, H.-P., Schneider, R., & Seeger, B. (1990). The R*-tree: An efficient and robust access method for points and rectangles. In Proceedings of ACM SIGMOD international conference on management of data. 60. Nievergelt, J., et al. (1984). The grid file: An adaptable, symmetric multikey file structure. ACM Transactions on Database Systems, 9(1), 38–71. 61. Kalojanov, J., & Slusallek, P. (2009). A parallel algorithm for construction of uniform grids. In Proceedings of the 1st ACM conference on high performance graphics—HPG ’09 (New York, New York, USA, 2009) (p. 23). 62. Yang, K., et al. (2007). In-memory grid files on graphics processors. In Proceedings of the 3rd international workshop on data management on new hardware—DaMoN ’07 (p. 1). 63. Finkel, R. A., & Bentley, J. L. (1974). Quad trees a data structure for retrieval on composite keys. Acta Informatica, 4(1), 1–9. 64. Samet, H. (2006). Foundations of multidimensional and metric data structures. Burlington: Morgan Kaufmann Publishers Inc. 65. Yi, K. (2008). Encyclopedia of algorithms. New York: Springer. 66. Aljafer, H., Malik, Z., Alodib, M., & Rezgui, A. (2014). A brief overview and an experimental evaluation of data confidentiality measures on the cloud. Journal of Innovation in Digital Ecosystems, 1(1–2), 1–11. 67. Acar, A., Aksu, H., Selcuk Uluagac, A., & Conti, M. (2017). A survey on homomorphic encryption schemes: theory and implementation. CoRR arXiv:1704.03578 68. Liu, J., Mesnager, S., & Chen, L. (2016). Partially homomorphic encryption schemes over finite fields. In C. Carlet, M. Hasan, & V. Saraswat (Eds.), Security, privacy, and applied cryptography engineering. SPACE 2016. Lecture notes in computer science (Vol. 10076). Cham: Springer. 69. Kaosar, M. G., Paulet, R., & Yi, X. (2012). Fully homomorphic encryption based two-party association rule mining. Data and Knowledge Engineering, 76, 1–15. 70. Rivest, R. L., Shamir, A., & Adleman, L. (1978). A method for obtaining digital signatures and public-key cryptosystems. Communications of the ACM, 21(2), 120–126. 71. Paillier, P. (1999). Public-key cryptosystems based on composite degree residuosity classes. In J. Stern (Ed.), Advances in cryptology (EUROCRYPT 1999). Lecture notes in computer science (Vol. 1592, pp. 223–238). Berlin: Springer. 72. Mani, M., Shah, K., & Gunda, M. (2013). Enabling secure database as a service using fully homomorphic encryption: Challenges and opportunities. arXiv preprint arXiv:1302.2654. 73. Lee, C.-C., Chung, P.-S., & Hwang, M.-S. (2013). A survey on attribute-based encryption schemes of access control in cloud environments. International Journal of Network Security, 15(4), 231–240.

74. Liu, X., Zhang, Y., Wang, B., & Yan, J. (2013). Mona: Secure multi-owner data sharing for dynamic groups in the cloud. IEEE Transactions on Parallel and Distributed Systems, 24(6), 1182– 1191. 75. Liu, Q., Wang, G., & Wu, J. (2010). Efficient sharing of secure cloud storage services. In 2010 IEEE 10th international conference on computer and information technology (CIT) (pp. 922–929). IEEE. 76. Yu, S., Wang, C., Ren, K., & Lou, W. (2010). Attribute based data sharing with attribute revocation. In Proceedings of the 5th ACM symposium on information, computer and communications security (pp. 261–270). ACM. 77. Yang, X., Du, W., Wang, X., & Wang, L. (2013). Fully secure attribute-based encryption with non-monotonic access structures. In 2013 5th international conference on intelligent networking and collaborative systems (pp. 521–527). Xi’an. 78. Shuci, J., Weibin, G., & Guisheng, F. (2017). Hierarchy attributebased encryption scheme to support direct revocation in cloud storage. In 2017 IEEE/ACIS 16th international conference on computer and information science (ICIS) (pp. 869–874). 79. Prabhakar, D. M., & Joseph, K. S. (2013). A new approach for providing data security and secure data transfer in cloud computing. IJCTT, 4(5), 1202–1207. 80. Seo, J. H., & Cheon, J. H. (2011). Fully secure anonymous hierarchical ldentity based encryption with constant size cipher texts. IACR Cryptology ePrint Archive, 2011, 21. 81. Baek, J., Newmarch, J., Safavi-Naini, R., & Susilo, W (2004). A survey of identity-based cryptography. In Proceedings of the 10th Annual Conference for Australian Unix User’s Group (AUUG 2004) (pp. 95–102). 82. Parsha, S. K., & Pasha, M. K. (2012). Enhancing data access security in cloud computing using hierarchical identity based encryption (hibe). International Journal of Scientific and Engineering Research, 3(5), 5. 83. Wang, G., Liu, Q., & Wu, J. (2011). Achieving fine-grained access control for secure data sharing on cloud servers. Concurrency and Computation Practice and Experience, 23(12), 1443–1464. 84. Dong, X., Yu, J., Luo, Y., Chen, Y., Xue, G., & Li, M. (2013). Achieving secure and efficient data collaboration in cloud computing. In 2013 IEEE/ACM 21st international symposium on quality of service (IWQoS) (pp. 1–6). IEEE. 85. Fontaine, C., & Galand, F. (2007). A survey of homomorphic encryption for nonspecialists. EURASIP J. Inf. Secur. Article 15(January 2007). 86. Rivest, R. L., Shamir, A., & Adleman, L. (1978b). A method for obtaining digital signatures and public-key cryptosystems. Communications of the ACM, 21(2), 120–126. 87. Goldwasser, S., & Micali, S. (1982). Probabilistic encryption & how to play mental poker keeping secret all partial information. In Proceedings of the fourteenth annual ACM symposium on Theory of computing (pp. 365–377). ACM. 88. ElGamal, T. (1985). A public key cryptosystem and a signature scheme based on discrete logarithms. In G. R. Blakley, & D. Chaum (Eds.), Advances in cryptology (CRYPTO 1984). Lecture notes in computer science (Vol. 196, pp. 10–18). Berlin: Springer. 89. Benaloh, J. (1994). Dense probabilistic encryption. In Proceedings of the workshop on selected areas of cryptography (pp. 120–128). 90. Naccache, D., & Stern, J. (1998). A new public key cryptosystem based on higher residues. In Proceedings of the 5th ACM conference on computer and communications security (pp. 59–66). ACM. 91. Okamoto, T., & Uchiyama, S. (1998). A new public-key cryptosystem as secure as factoring. In Advances in cryptology— EUROCRYPT’98 (pp. 308–318). Springer.

123

S. V. Limkar, R. K. Jha 92. Paillier, P. (1999). Public-key cryptosystems based on composite degree residuosity classes. In Advances in cryptology— EUROCRYPT’99 (pp. 223–238). Springer. 93. Yao, A. C. (1982). Protocols for secure computations. In Proceedings of the 23rd Annual Symposium on Foundations of Computer Science (SFCS ’82) (pp. 160–164). Washington, DC: IEEE Computer Society. 94. Sander, T., Young, A., & Yung. M. (1999). Non-interactive crypto computing for NC1. In 40th annual symposium on foundations of computer science (pp. 554–566). 95. Boneh, D., Goh, E. J., & Nissim, K. (2005). Evaluating 2-DNF formulas on ciphertexts. In J. Kilian (Ed.), Theory of cryptography (TCC 2005). Lecture notes in computer science (Vol. 3378, pp. 325–341). Berlin: Springer. 96. Ishai, Y., & Paskin, A. (2007). Evaluating branching programs on encrypted data. In S. P. Vadhan (Ed.), Theory of cryptography (TCC 2007). Lecture notes in computer science (Vol. 4392, pp. 575–594). Berlin: Springer. 97. Gentry, C. (2009). A fully homomorphic encryption scheme. Ph.D. dissertation. Stanford University. 98. Van Dijk, M., Gentry, C., Halevi, S., & Vaikuntanathan, V. (2010). Fully homomorphic encryption over the integers. In Advances in cryptology—EUROCRYPT 2010 (pp. 24–43). Springer. 99. Brakerski, Z., & Vaikuntanathan, V. (2011). Fully homomorphic encryption from ring-LWE and security for key dependent messages. In Advances in cryptology—CRYPTO 2011 (pp. 505–524). Springer. 100. López-Alt, A., Tromer, E., & Vaikuntanathan, V. (2012). Onthe-fly multiparty computation on the cloud via multikey fully homomorphic encryption. In Proceedings of the forty-fourth annual ACM symposium on theory of computing (pp. 1219–1234). ACM. 101. Gjøsteen, K. (2004). Subgroup membership problems and public key cryptosystems. Ph.D. Dissertation, Norwegian University of Science and Technology. https://brage.bibsys.no/xmlui/ bitstream/handle/11250/249681/121977_FULLTEXT01.pdf? sequence=1. 102. Galbraith, S., Gebregiyorgis, S., & Murphy, S. (2016). Algorithms for the approximate common divisor problem. LMS Journal of Computation and Mathematics, 19(A), 58–72. 103. Atmospheric and space flight vehicle coordinate systems, ANSI/AIAA R-004-1992. 104. Stevens, B. L., & Lewis, F. L. (1992). Aircraft control and simulation. New York: Wiley. 105. Zipfel, P. H. (2000). Modeling and simulation of aerospace vehicle dynamics. AIAA education series. Reston, Virginia. 106. Moon, B., Jagadish, H. V., Faloutsos, C., & Saltz, J. H. (2001). Analysis of the clustering properties of the hilbert space-filling curve. IEEE Transactions on Knowledge and Data Engineering, 13(1), 124–141. 107. Zhou, X., Abel, D. J., & Truffet, D. (1998). Data partitioning for parallel spatial join processing. GeoInformatica, 2(2), 175–204. 108. Muthukrishnan, S., Poosala, V., & Suel, T. (1999). On rectangular partitionings in two dimensions: Algorithms, complexity, and applications. In ICDT (pp. 236–256). 109. Scheuermann, P., Weikum, G., & Zabback, P. (1998). Data partitioning and load balancing in parallel disk systems. The VLDB Journal, 7(1), 48–66. 110. Nguyen Thai, B., & Olasz, A. (2015). Raster data partitioning for supporting distributed GIS processing. In The international archives of the photogrammetry, remote sensing and spatial information sciences, volume XL-3/W3, 2015 ISPRS geospatial week 2015, 28 Sep–03 Oct 2015, La Grande Motte, France. 111. Aji, A., Vo, H., & Wang, F. (2015). Effective spatial data partitioning for scalable query processing. CoRR arXiv:1509.00910

123

112. Vo, H., Aji, A., & Wang, F. (2014). SATO: A spatial data partitioning framework for scalable query processing. In Proceedings of the 22nd ACM SIGSPATIAL international conference on advances in geographic information systems (SIGSPATIAL ’14) (pp. 545– 548). New York, NY: ACM. 113. Hershberger, J., Shrivastava, N., Suri, S., & Toth, C. D. (2006). Adaptive spatial partitioning for multidimensional data streams. Algorithmica, 46(1), 97–117. 114. Akdogan, A., Indrakanti, S., Demiryurek, U., & Shahabi, C. (2015). Cost-efficient partitioning of spatial data on cloud. In 2015 IEEE international conference on big data (big data), Santa Clara, CA (pp. 501–506). 115. Ferhatosmanoglu, H., Agrawal, D., Egecioglu, Ö., & El Abbadi, A. (2005). Optimal data-space partitioning of spatial data for parallel I/O. Distributed and Parallel Databases, 17(1), 75–101. 116. Dean, J., & Ghemawat, S. (2008). MapReduce: Simplified data processing on large clusters. Communications of the ACM, 51, 107–113. 117. Apache Hadoop Project. Open source software for reliable, scalable, distributed computing [EB/OL] [2010-09-18]. http:// hadoop.apache.org. 118. http://spark.apache.org/. 119. http://storm.apache.org/. 120. https://impala.incubator.apache.org/. 121. http://giraph.apache.org/. 122. Lee, D. T., & Preparata, F. P. (1984). Computational geometry: A survey. IEEE Transactions on Computers, C–33(12), 1072–1101. 123. http://www.gns-gmbh.com/index.php?id=238&L=1. 124. https://my.vmware.com/en/web/vmware/free#desktop_end_ user_computing/vmware_workstation_player/12_0. 125. https://www.cloudera.com. 126. Hill, M. D., & Marty, M. R. (2008). Amdahl’s law in the multicore era. Computer, 41(7), 33–38.

Suresh V. Limkar is currently an assistant professor in Computer Engineering Department, AISSMS Institute of Information Technology, Pune, Maharashtra, India. He received the B.E. degree in Computer Science Engineering from Dr. Babasaheb Ambedkar Marathwada University, Aurangabad, India, M.E. degree in Computer Engineering from Pune University, Pune, India. He is currently pursuing Ph.D. degree in Electronics and Communication Engineering at Shri Mata Vaishno Devi University, Katra, Jammu and Kashmir, India. He had published his research work in more than 25 International Conferences and Journals. His research interest includes the Internet of Things, Network Security and Location Based Services, Currently he is doing his research work on efficiently processing IoT generated data and security issues IoT data transmission. He is a member of IEEE, life member of CSI and ISTE.

Computing over encrypted spatial data generated by IoT Rakesh Kumar Jha is currently an assistant professor in school of electronics and communication department, SMVD University Katra (J&K). He is carrying out his research in WiMAX and Security issues in the laboratory ECED Lab, SMVDU. Involved research topics include WiMAX performance analysis, LBRRA, power optimization and security analysis. He has done B.Tech. in Electronics and Communication from Bhopal and M.Tech. from NIT Jalandhar, India. Received his Ph.D. degree from NIT Surat in 2013. Published more than 50 International Conference and Journal papers. His area of interest is Wireless communication, Communication System and computer network, and Security issues (Opti System). Dr. Jha’s one concept related to router of Wireless Communication has been accepted by ITU (International Telecommunication Union) in 2010. He has received young scientist author award by ITU in Dec 2010, APAN fellowship in 2011 and student travel grant from COMSNET 2012. He is a member of IEEE, GISFI and SIAM, International Association of Engineers (IAENG) and ACCS (Advance Computing and Communication Society).

123

Computing over encrypted spatial data generated by IoT

Recommend Documents