Telecommunication Systems https://doi.org/10.1007/s11235-018-0479-4
Computing over encrypted spatial data generated by IoT Suresh V. Limkar1 · Rakesh Kumar Jha1
© Springer Science+Business Media, LLC, part of Springer Nature 2018
Abstract Proliferation of IoT devices produces the enormous amount of data that need to be stored on clouds. A main focus of this paper is to ensure the secrecy of data, while it is in transit through unsecure communication media from edge devices to cloud and to provide security to the data once it is stored on public cloud. A new techniques based on computing over encrypted data i.e. homomorphic encryption seems to be promising methods. However, most of the previous works supporting computing over encrypted data are neither efficient nor compatible with IoT data and real time spatial data streams because there is a huge difference between normal data and spatial data. In this paper, we proposed a framework for computing over IoT generated encrypted spatial data. In order to provide computation over encrypted data first it needs to be indexed in standard data structure. For indexing encrypted data, we used R tree and its variants. We also proposed a method of most efficient and scalable, parallel construction of R trees and its variants on real time encrypted spatial data. We fired spatial range queries on encrypted spatial data. Specifically, the spatial range query execution time over encrypted spatial data of our proposed scheme is extremely efficient which takes slightly more time as taken by normal spatial range query executed over non-encrypted real time spatial data. Our scheme is not only efficient, but also highly compatible and scalable with IoT generated spatial data. Moreover, we rigorously define the scalability, query performance time, analyze the security of our schemes, and also conduct extensive experiments with a real time spatial dataset to demonstrate the performance of our schemes. Keywords Data security · Cloud computing · IoT · Zetta · ADS-B · R tree · MapReduce · Apache Spark · Homomorphic encryption
1 Introduction Internet of things (IoT) [1,2] is a fast-growing collection of internet-connected sensors embedded in a wide-ranging variety of physical objects i.e. things. Whereas things can literally be any physical object (animate or inanimate) on the planet to which you could connect or embed a sensor. Sensors can take a large number of possible measurements. Internet connectivity to the things can be either wired or wireless. These things are producing enormous amounts of new, structured, unstructured, real-time big data. The IoT will never go to sleep; billions of internet connected things will become an information collector and also huge information producer.
B
Rakesh Kumar Jha
[email protected] Suresh V. Limkar
[email protected]
1
School of Electronics and Communication Engineering, Shri Mata Vaishno Devi University, Katra, J&K, India
Gartner, Inc. [3] predicts that 8.4 billion devices will be in use across the world till the end of 2017, i.e. 31% more than in the year 2016, and will reach 20.4 billion by the end of the year 2020. Whereas it generate 400 ZBs of data up to year 2018, up from 113.4ZBs in 2013. A zettabyte is a trillion gigabytes. IDC [4] forecasted that by 2020, machine generated data will be 1/10th of the total generated world’s data. Day by day there is enormous improvement in the sensor and communication technology, resulting into generating new sources of geospatial big data [5,6]. The geospatial data are found in the various domains such as military, politics, medical, marketing, resource, environment and economy, etc., which is highly confidential and sensitive, if such information is illegally taken, forged or spoiled leads to countless loss. Geospatial big data [7] is type of spatial data which surpasses the capacity of current computing systems. Geospatial data contributes a major part of today’s big data and the volume of such data is increasing rapidly by 20% in every year
123
S. V. Limkar, R. K. Jha
Fig. 1 Gartner’s hype cycle [9]
[8]. From Garner’s hype cycle in Fig. 1 [9], it is clear that geospatial big data is at peak stage of expectation which is highlighted in red colored rectangle. It also shows in memory database management system is a necessity, which is highlighted in blue colored rectangle. For example, FitBit application which is used for common health monitoring. The FitBit gathers various types of data such as quality of sleep, step-counts, heart rate and location. This type of data can be used to find personal private information such as whether user’s are suffering from chronic diseases, user’s social interactions, user’s mental health conditions. Inferring such personal information might be of great value, from the user’s perspective. However, the user not at all interested in sharing such information with others i.e. corporate service provider. User always try to avoid unauthorized sharing of data, along with disclosure of sensitive information. For securing spatial data, proper encryption mechanism must be applied. Very few researchers paid attention towards encrypting spatial data. Necessity of cloud storage is unavoidable for the dramatically growing IoT data, not only for storage purpose, but also to make this data available and easily accessible to other devices and services. As it is not easy nor even lucrative for number of companies and organizations to maintain a huge
123
amount of data on their local servers. Thus, it is very common to see companies and organizations [10], outsourcing their data including spatial big data to public cloud providers such as Amazon [11] and Google [12] cloud. However, the violation of privacy and security incidents in the cloud ware observed, outsourcing data to the cloud leads to compromising the security and privacy [13] and also it is very easy for an inside attacker (e.g. cloud administrator) to compromise the security. For example, compromising the spatial datasets outsourced by Foursquare through the transgression of Amazon Web Services (AWS) would put in risk millions of users’ private location data. So researchers are continuously working hard to design a security mechanism where we can delegate the ability to process our data to untrusted party, without giving away access to it i.e. in simple way if we wish, not to compromise the privacy and security of dataset, then apply traditional encryption techniques [14] before outsourcing dataset to cloud, but certainly it introduces difficulties in terms of search functionalities over encrypted data. However, the disadvantage of this approach is twofold: (i) in order to process the geospatial big data computing power of the cloud is not considered; and (ii) while analyzing the geospatial big data, user requires high bandwidth and high computing power on the user-side, and it results in unde-
Computing over encrypted spatial data generated by IoT Table 1 Spatial data versus non-spatial data Spatial data
Non-spatial data
Data that describe location. Which are expressed with the graphic primitives i.e. points, polygons or pixel’s
Data that is independent of all geometric consideration
For e.g. It includes location, shape, size and orientation
For e.g. person’s height, mass, weight and age are non-spatial data
These types of data are multidimensional and auto correlated
These types of data are single dimensional and independent
Spatial data are difficult to create
Non-spatial data are easy to create
A spatial database is often used as a storage container for geospatial data
Non-spatial database is often used as a storage container for non-spatial data
Spatial database has the ability to store and access both Location/Spatial Information and Attributes/Non-spatial Information
Non-spatial Database: has the ability to store and access only Attributes/Non-Spatial Information
A spatial database provides special functions and indexes for querying and manipulating geospatial data using something like Structured Query Language (SQL)
Non-spatial database doesn’t provide such functions and indexes
A spatial database uses spatial query
Non-spatial database don’t support spatial queries
Examples
Examples
1. SpatiaLite
1. SQLite
2. Oracle Spatial/Locator
2. Oracle
3. MySQL Spatial
3. MySQL
4. PostGIS
4. PostgreSQL
5. DB2 Spatial
5. IBM DB2/Informix
6. MsSQL Spatial
6. MsSQL server
7. Spatial Hadoop
7. Hadoop 8. MS Access
sired application delays because first user needs to download, decrypt, and finally process the geospatial big data. Furthermore, users lose the power to search their geospatial big data, which is the most important task considered for the growing geospatial data. Again, traditional encryption techniques are well appropriate for non-spatial data, as spatial data possesses some unique different properties. Difference between spatial data and non-spatial data is shown in Table 1. As per Forrester’s analysis shown in Fig. 2, IoT encryption seems to be attained a significant success in past and predicted 3–5 years to reach to full pick level i.e. at equilibrium phase. Moreover, [15] suggested all IoT encryption must be go through the full encryption key lifecycle managing processes, because poor key management will reduce overall security. Rest of the paper is organized as follows. Section 2 discusses the back ground of the work, discussion on system model is given in Sect. 3, Sect. 4 discusses the different indexing structure used for geospatial data, and Sect. 5 discusses the different encryption methods required for storing data on cloud followed by proposed work. Experimental setup is given in Sect. 6, threat model in Sect. 7. Section 8 discuss security analysis with the help of attack model and lastly
Fig. 2 Forrester’s analysis [16]
Sect. 9 gives the experimental result followed by conclusion and future scope.
123
S. V. Limkar, R. K. Jha
2 Background of research Need of cloud storage is unavoidable for the dramatically growing IoT data, not only for storage purpose, but also to make this data available and easily accessible to other devices and services. As it is not easy nor even lucrative for number of companies and organizations to maintain a huge amount of data on their local servers. Thus, it is very common to see companies and organizations, outsourcing their data including spatial big data to public cloud providers such as Amazon [13] and Google [14] cloud. However, the violation of privacy and security incidents in the cloud are observed, outsourcing data to the cloud leads to compromising the security and privacy [29] and also it is very easy for an inside attacker (e.g. cloud administrator) to compromise the security. For example, compromising the spatial datasets outsourced by Foursquare through the transgression of Amazon Web Services (AWS) would put in risk millions of users’ private location data. organization want to compute on the data, the data must be first downloaded and decrypted, resulting not utilizing the advantage of cloud computing. In order to utilize the advantage of the cloud computing, companies must use homomorphic encryption [79]. It is a method of encryption which permits computations on encrypted data, without decrypting it. Searchable Encryption [17] is an efficient approach to facilitate search functionalities on cipher text over public cloud without decryption. Specifically with searchable encryption scheme, user can get accurate search results from an honest-but curious server without disclosing private data or queries. A various searchable encryption techniques [18– 26] have been proposed, most of them emphasis on common SQL queries such as keyword search, which are not suitable for geospatial data, as we have already mentioned normal dataset and geospatial dataset are having different properties. In recent years, a few searchable encryption techniques [27–31] have drawn their attentions particularly to geometric range queries over spatial datasets, where a geometric range query can be executed such as to retrieves points inside a geometric shape i.e. circle or a polygon [32]. Researchers in [28] able to execute circular range query on encrypted geospatial data using a symmetric-key and verified its security under selective chosen-plaintext attacks. Limitation of this scheme is, it can execute only circular range query. Researchers in [30] designed a symmetric-key searchable encryption scheme that can support geometric range queries on encrypted spatial data, that too applicable to any general geometric shape in order to overcome the limitation of [28] and verified its security under selective chosen-plaintext attacks.
123
Researchers in [31] proposed the concept of geometrically searchable encryption FastGeo to boost search time over 100 times by indexing spatial data in to hash table. Researchers in [33,34] proposed a method where two parties can privately compute and check whether a particular point is inside a geometric shape or not based on secure multiple party computation. Similarly, authors in [35,36] proposed a method where two or more users can securely verify whether one user is inside circle, which can help two users to securely confirm whether one user is inside a circle of another user based their locations, is also based on Secure Multi-party Computation [37]. Researchers in [38] proposed a novel machine to machine reputation system which enables machines to securely compute the global trustworthiness of machines in a complete decentralized and that too preserving privacy. The proposed model ensures that the machines cannot learn anything about the participating machines other than the aggregated global reputation score which is not privacy sensitive. Researchers in [39] proposed a novel system for the effective spam detection and blocking of spammers with the help of the decentralization and homomorphic cryptography. Proposed system is easily scalable to handle large number and also improves the rate of detection of spam and block the spammers with less overhead and very small communication cost. Researchers in [40] proposed a decentralized reputation aggregation protocol that enables multiple telecommunication operators (OPs) to actively take part in a collaboration process in the absence of trusted third party centralized system and without developing a predefined trust relationship with other OPs. With the exchange of cryptographic scores among OPs collaboration in between multiple Ops is achieved. Researchers in [41] presented a survey of primary security issues for IoT and categorize most popular security issues with respect to layered architecture of IoT. It also outline the security requirements for IoT along with the existing attack models and threats with its standard solutions. It also discuss the blockchain technology, which solve most of the IoT security problems. Above work [32–36] are based on secure multi-party computation, which require extensive rounds of interactions amongst all the parties involved in the computation. However, none of the above techniques cannot support execution of spatial queries over IoT generated data and is still remains open challenge. Limitation of existing work are listed out below. (i) Key management is tedious task Techniques mentioned in [28], [30], [31], are supported for private data produced by only one device. For e.g. if number of data owners are 10, then data owners need to acquire
Computing over encrypted spatial data generated by IoT
10 different tokens for encrypting there query, cloud server works on 10 different data owner and return 10 different result, which takes more time for searching and collecting the result from10 different data owners, ultimately results into more utilization of network bandwidth and cloud resources. Managing a key amongst number of private data producer is tedious task. These techniques are not suitable for IoT devices, as in case of IoT, billions of devices i.e. data producers exist. (ii) Based on conventional encryption Conventional encryption is not fast as homomorphic encryption because homomorphic encryption is based on pure mathematical operations. (iii) Based on two dimensional dataset All existing papers are based on two dimensional spatial data i.e. latitude and longitude. Whereas we considered three dimensional spatial data i.e. latitude, longitude and altitude.
2.1 Contributions The first optimal goal of this paper is to provide at most security to IoT generated geospatial data while it is in transit from edge devices to cloud and from third party cloud i.e. public cloud. This is accomplished by using homomorphic encryption. The second optimal goal of this paper is to reduce the time required for indexing the encrypted geospatial data generated from IoT devices for efficient processing spatial queries on the cloud. This is accomplished by using most advanced open source parallel architecture framework, i.e. MapReduce and Apache Spark and the third optimal goal of this paper is to reduce the execution time required for executing spatial query on encrypted geospatial data. Our technical contributions in this paper can be summarized as follows. I. Proposed an efficient parallel architecture framework for processing real time encrypted geospatial big data generated by internet connected devices for the aviation industry. This is achieved through indexing encrypted geospatial big data in R tree and its variants. II. Proposed framework is simulated in IoTZetta architecture, which eliminates the existing issues in IOT. III. Proposed a novel algorithm for parallel merging of two R-tree and its variants using the two frameworks i.e. Apache MApReduce and Apache Spark framework. IV. We experimentally evaluated and compared our proposed algorithms in terms of execution time, insertion time, update time, search time and scalability with respect to normal IoT data (i.e. non-encrypted IoT generated Geospatial big data). We used various metrics to measure the quality of the resulting trees.
V. To the best of our knowledge, this is the first paper which exploit the advantages of Apache Spark over MapReduce framework for encrypted geospatial big data processing with in-memory computation and ondisk computation. VI. To the best of our knowledge, this is the first paper which exploit the advantages of homomorphic encryption for IoT generated real time spatial data. Rest of the paper is organized as follows. Discussion on system model of proposed work is given in Sect. 3, which covers comprehensive survey on various IoT platforms, issues in existing IoT Platforms, Simulation of proposed work in IoT Zetta, IoT Zetta component stack. Section 4 discusses the different indexing structure used for geospatial data. Section 5 discusses the different encryption methods required for storing data on cloud followed by proposed work. Experimental setup is given in Sect. 6. Section 7 discuss about the theoretical analysis of the proposed system and lastly Sect. 8 gives the experimental result followed by conclusion and future scope.
3 System model for proposed framework Various IoT platforms [42,43] are studied and summarized in Table 2 with its features. We selected IoTZetta architecture because it overcomes the issues in existing IoT, which are described in Table 3.
3.1 Zetta architecture [52] Zetta is an open source platform completely developed in Node.js [55]. It is mainly used for creating IoT server which can run anywhere across the geographically scattered computers. Zetta is the amalgamation of three technologies i.e. websockets, reactive programming and REST API. Such combination is highly suitable for grouping large no. of devices at one place for real time application such as smart home and smart farming. Zetta is a lightweight process that resides in more than one place such as on the edge of the network, on a hardware hub, Raspberry Pi, BeagleBone, data center or on a cloud server. Zetta will mediate protocols in the hardware hub, i.e. nothing but separation between API technology and Device protocol technology. Zetta process create a secure link between each other using Z2Z protocol. Zetta maintains a layered approach, keeping components pluggable and extensible. Zetta is developer friendly because it is based on the API.
3.2 Simulation We simulated aviation industry application in Zetta. Overall simulation environment comprises three modules; such as
123
S. V. Limkar, R. K. Jha Table 2 Summary of various IoT platform Platform
Features Proprietary things
Open things
Private cloud
Xively [44]
ThingSpeak [45]
IoT Toolkit [46]
n/a
Open.Sen.se [47]
openPicus [48]
NimBits [49]
Public cloud
n/a
Free
Application API
Last update
Ready to use
n/a
n/a
n/a
Dec 13
n/a
Open source
n/a
n/a
n/a
n/a
n/a
n/a
Partly
n/a
Mar 14
NetLab [50]
n/a
Dec 13
OpenIoT [51]
Dec 14
IoT Zetta [52]
Apr 17
IoTCloud [53]
n/a
n/a
May 13
Table 3 IoT Issues with its description’s and solution provided by Zetta architecture S. no.
Issues in IoT
Description
Solution provided by Zetta
1
Large number of protocols and its interfaces are available [42,43]
Existence of large number of protocols and its interfaces, amongst them which one is to use? Whether a provision is made for standard future proofing and security provision against these new protocols or not made is unknown
Use of HTTP protocol to communicate with all devices. Every device gets an API generated from Node.js
2
IoT Consist of large number of devices [1,2]
Large no of devices are involved in IoT system, there is necessity to have a standard mechanism to coordinate these large number of devices by providing various types of communication such as D2D: Device to Device, D2G: Device to Gateways, D2C: Device to cloud, S2S: server to server communication in large IoT systems and must improve on emergent behaviors of these devices in large systems
With the use of REST API, web sockets and reactive programming, leads to easy coordination of devices across the globe
3
IoT Devices generate huge amount of data [54]
In upcoming years IoT will become the biggest source of data on the planet and we must be prepared for handling such huge data. We must address issues related to IoT Data such as, which data need to be stored, what techniques we must adopt for storing, how to extract meaningful information and learn from this IoT Data
Broadcasting data over web sockets made data analyzing more easy
IoTZetta Clients, IoTZetta Server and IoTZetta Cloud Server. All these modules are discussed in details.
3.2.1 IoTZetta clients Skies around the world are increasingly crowded because of rising number of flights. Automatic dependent surveillance— broadcast (ADS-B) enables the widespread use of satellite based GPS technology in the navigation. It provides Air
123
traffic control (ATC) information that help aircraft pilot to efficiently navigate in increasingly congested space. ADS-B allows aircrafts to determine its exact location using Global Positioning System (GPS). This received information is broadcast to other nearby aircrafts and ADS-B receivers, which relay the information to ATC. To make this possible, aircraft will be required to continuously broadcast their position, i.e. through ADS-B Transponder and optionally additional equipment that enables the aircraft to receive those from another plane as well as information from ground
Computing over encrypted spatial data generated by IoT
(i) Driver It facilitates aircrafts to be represented in terms of a state machine and facilitates interaction through APIs such as query devices to get real time sensor data streams, inter-
ADS-B Receiver
ADS-B Receiver
Driver Server Extensions
Scouts
Zea Server Clients Registry
Key Pair Generator
IoT Zea Cloud Server
GPS Satellite
ADS- B receiver
ADS - B Transponder equipped aircra
Fig. 3 Proposed system model for real time data encryption
Flightradar24 Web Streaming
Driver Server Extensions Scouts Kay Pair Generetor
Clients Registry
IoT Zea Server HTTP API (JSON) & Websocket
It is an uppermost level of abstraction in Zetta. It runs on any hardware such as Intel Edison, Raspberry PI, etc. It generates APIs for each devices to whom it has to connect and coordinates interaction amongst all devices. Mainly it consist of the following components.
ADS-B Receiver
HTTP API (JSON) & Websocket
3.2.2 IoTZetta server
IoT Zea Clients
IoT Zea API
stations. This continuous instantaneous cycle to broadcast give ATC and pilot with complete and more up to date picture of the airspace. ADS-B provides updates 12 times more frequently than the radar, which means better coordinate for takeoffs and landings, less time on the ground or circling in the air, nor directs routs and fewer delays. Increased airspace efficiency is another cheap benefit of ADS-B. Earlier systems are limited to, land based areas, leaving 70% of the world area uncovered creating inflexible and inefficient routing over much of the world’s aerospace. These global blind spots disappear in the beginning of year 2017. Currently, ADS-B receiver depends on satellite relay signals from all ADS-B equipped aircrafts in the ATC worldwide. This provides new real time visibility of ADS-B equipped aircraft in any flight information region, including current surveilled ocean, polar, dessert and not necessary space. Satellite based ADS-B will enable the optimization of the aircraft’s path, increase operational and fuel efficiency for airlines and significantly enhance safety all in less investment and in the less infrastructure. With the adoption of satellite based ADS-B surveillance no aircraft will ever be lost again. In this module aircraft is connected to the internet through ADS-B transponder and ADS-B receiver. ADS-B Transponder is installed in the aircraft for receiving its current location information from a constellation of GPS Satellites and sending its current location to ADS-B receiver. The ADS - B receiver is used for real time tracking of aircraft position from the base station. The ADS - B receiver is connected to the Zetta server though IoT API. We simulated two types of IoTZetta clients. In first scenarios we used three physical ADS-B receiver which could receive the real time aircrafts location update information within its range but it could fetch only few aircrafts. First Simulated scenario is shown in Fig. 3. The second scenario is simulated, in order to get more number of aircrafts encrypted real-time position updates. For this we used real time database provided by global flight systems, i.e. flightradar24 [56]. It provides real-time position updates of thousands of aircrafts across the world. Second simulated scenarios are shown in above Fig. 4.
IoT Zea Cloud Server
Fig. 4 Proposed system model for real time data encryption
123
S. V. Limkar, R. K. Jha
act with aircrafts to know its current status and locations, provides links between servers, which establishes a secure connection between two Zetta servers for this it uses Z2Z protocol for efficiently proxy API requests, and streaming data between servers. (ii) Scouts It mainly involves in the discovery mechanism, i.e. searching for any new aircrafts and if it founds any new aircraft in the sky then it report back to the server. (iii) Server extensions These are the pluggable models required for adding additional APIs and providing additional securities to APIs. (iv) Registry It is a small local database which holds information about all the aircrafts which are in the sky and the server itself. (v) Key pair generation It mainly involves in generating key pairs for the homomorphic encryption and decryption.
Application Layer
JSON API
Mediation Layer
IoT Zetta Drivers
Protocol layer
Using ADS-B Device Specific protocol
Network and Transport Layer
Using Serial (Web API)
3.2.3 IoTZetta cloud server IoTZetta server is connected to the IoTZetta cloud server through HTTP APIs (JSON) and websocket. Mainly it holds all the real time encrypted spatial data received from IoT Zetta server for further processing and doing analytics.
3.3 IoTZetta components IoT Zetta components are stacked in five layers. In physical and data link layer ADS-B receiver is used to receive the aircraft’s real time data in its range through universal access transceivers (UAT) and 1090 MHz extended squitter. The data which is received at physical and data link layer is forwarded to network and transport layer using serial communication and web APIs. Protocol layer uses ADS-B receiver specific protocol, whereas mediation layer uses IOTZetta drivers. Application layer composed of JSON APIs. Details IoT Zetta components is shown in the Fig. 5.
4 Realization of different indexing structure for proposed framework In order to index the IoT generated encrypted spatial data, we used R tree [57] and its variants [58,59]. These data structures have emerged as practically most efficient indexing methods and widely accepted, adopted in geospatial data management. Most popular spatial indexing techniques are summarized in Table 4.
123
Physical and Data Link layer
Using Universal Access Transceiver (UAT) & 1090 MHz Extended Squitter
Fig. 5 IoTZetta components
4.1 Introduction to R tree R trees [57] are similar to B+ trees, which is a height balanced tree. It consist of three main parts i.e. root node, internal node and leaf node. Root node holds the pointer reference to the larges region(s). Internal nodes hierarchically stores pointers to their child nodes for storing data bounded with rectangles parent’s area. Leaf nodes are used to stores the actual data with respective minimum bounding region (MBR). MBR is basically a sub region within the entire space that group data efficiently. Just like in B+ trees, in R tree each node contains at least m and at most M entries, where m ≤ M/2. R tree variants are evolved for improving the performance by fine-tuning some parameters, such as in R* trees [59], which is widely accepted for achieving the best performance, a number of parameters were introduced such as forced re-insertions during insertions, buffering and optimization criteria for splitting/merging nodes and adjusting the complex MBRs. The comprehensive comparison amongst R tree and its variants is given in the Table 5.
Computing over encrypted spatial data generated by IoT Table 4 Summary of various spatial indexing techniques Characteristics
Major spatial indexes Grid file [60–62]
Quad tree [63,64]
R-tree [57,65]
Partition technique
Space-based
Space-based
Data-based
Handling skewness
Very poor
Moderate
Very good
Complexity
Index creation and tuning is more complex
Tuning is more complex
Index creation and tuning are easier
Storage
More storage is required
More storage is required
Less storage is required
Whole-earth model
Possible
Not-possible
Possible
Hierarchical structure
Absent
Present
Present
Multidimensional
No
No
Yes
Query processing
Less efficient
Less efficient
More efficient
Parallelization friendly
Substantial
Moderate
Moderate
Table 5 Comprehensive comparison amongst R tree and its variants S. no.
1
Characteristics
Overlap
R tree and its variants R tree [57]
R+ tree [58]
R∗ tree [59]
More overlap
Minimum overlap as compared to R tree
Minimum overlap as compared to R tree and R+ tree
2
Pruning
Less pruning
Minimum pruning
Improved pruning
3
Dead space
More dead space
Minimum dead space compared to R tree
Minimum dead space compared to R tree and R+ tree
4
Advantages
Construction and maintenance of R trees is simple
Nodes are not overlapped with each other, better performance of point query
Minimum Coverage and minimum overlap
Nodes are overlapped with each other 5
Disadvantages
R tree are smaller than R+ tree built over the same data
1. Since rectangles are duplicated, R+ tree can be larger than R tree built over same data 2. Construction and maintenance of R+ trees is more complex than R tree
Construction and maintenance of R∗ trees is more complex than R tree and R+ trees
6
Coverage
More coverage
Minimal than R tree
Minimal then R and R+ tree
5 Realization of different encryption scheme for proposed framework To achieve the security of data, encryption is the best method. Encryption is defined as converting data from one form to another form i.e. from plane text to cipher text. However, the traditional encryption methods won’t work on the cipher text without converting it into plain text. The user have to sacrifice the privacy in order to make use of cloud services. Furthermore, untrusted public cloud service providers or cloud operators can keep watch on user’s data after users end the association with the service providers [66].
Different evaluation criteria’s [66] considered for comparison of several encryption methods used in the cloud for data confidentiality are listed below: a. Confidentiality of data: Any unauthorized user should not come to know about the information of encrypted data. b. Fine-grained access control: Each individual user linked with different access rights c. Scalability: System must be scalable and efficiently work even when the number of authorized users increases. d. User responsibility: When authorized user disclose his credential i.e. key to unauthorized users then authorized user must be responsible. e. User revocation: If authorized user leaves the system, then his/her private key should be revoked and access
123
S. V. Limkar, R. K. Jha Table 6 Comparison of various encryption scheme used for data confidentiality in the cloud Criteria
Homomorphic encryption [67–72]
Data confidentiality
×
Fine grained access
Scalability
×
×
×
User revocation
×
User rejoin
×
Collision resistance
Fast
×
×
×
User responsibility
×
×
×
Cipher text size
Larger
Larger
Larger
f.
g.
h. i.
ABE [73, 74]
KP-ABE [75]
to the data denied. This must be done without affecting existing users in the system. User re-join: Ability of considering the authorized user back into the system after revocation, without affecting existing users in the system. Collusion resistant: At any cost encrypted data files cannot be decrypted by combining the attributes of authorized users. Fast: This refers to the time required for performing encryption on plaintext. Cipher text size: This refers to the size of the generated cipher text.
Table 6 shows the comparison of various encryption scheme used for data confidentiality in the cloud.
5.1 Homomorphic encryption In this section, we explain basics of the Homomorphic Encryption (HE). Then, we present notable difference in, Partial Homomorphic Encryption (PHE), Somewhat Homomorphic Encryption (SWHE) and lastly Full Homomorphic Encryption (FHE) schemes. With the help of Homomorphic Encryption (HE) public cloud can able to perform certain computation on the encrypted data while protective the privacy of the data. This section also discuss, how this encryption is applied to our proposed method. A HE [85] method is mainly categorized by four operations such as Key Generation, Encryption, Decryption, and Eval function. Key Generation is the procedure for generating a pair of secret and public key for asymmetric form of HE or a single key for symmetric form of HE. Basically, the operations of key generation, encryption and decryption are same as conventional encryption methods. However, Eval is a HE-specific operation, whose input is cipher texts and also outputs is a cipher text which is exactly corresponds to plain text. Eval allows the function f () above the cipher texts with-
123
CP-ABE [76]
ABE W. non monotonic [77]
HABE [78]
AES [79]
HIBE [80–84]
×
×
×
×
×
×
×
×
×
×
×
×
×
×
Larger
Larger
Larger
Larger
Larger
out seeing the actual messages. The most important task in HE is preserving the format of the cipher text after completing the Eval function operation along with this size of the cipher text should be constant in order to support unlimited number of operations. Else more resources will be required resulting in to limited number of operations. Basic steps involved in homomorphic encryption is illustrated in Fig. 6. It consist of data owner, data user and cloud. First data owner encrypts his message m, Enc(m) and send it to cloud. If data user send search f () if he want to get some data from the encrypted data which is stored on the cloud. Then cloud evaluates f (m) homomorphically and returns the Enc(f (m)) to the data user. Data user computes Dec (Enc (f(m))) = f(m), and recovers f(m). In our proposed work IoT Zetta server generate key pair and then encrypt received IoT data and send it to the public cloud. When IoT data owner wants to perform some computation function over encrypted data, he sends the compute function to public cloud. After receiving compute function public cloud performs homomorphic operation on encrypted IoT data using the compute function and return the encrypted result to IoT data owner. Finally, IoT data owner recovers his data with his own key and obtain the actual data. Here compute function on the public cloud side does not require the private key of the IoT data owner and it allows numerous operations i.e. addition and multiplication on the encrypted data. Moreover, homomorphic encryption is expensive in terms of heavy computations and complicated mathematical operations involved in it. However, to overcome this various HE schemes in the literature found and are precisely categorized in three main categories with respect to the number of permitted tasks on the encrypted data such as: (1) Partially Homomorphic Encryption (PHE) [86–92] permits only one type of operation i.e. either addition or multiplication with an indefinite number of times. (2) Somewhat Homomorphic Encryption (SWHE) [93–96] permits some types of operations with a limited number of times. (3) Fully
Computing over encrypted spatial data generated by IoT
1. Encrypts his message m, Enc(m) and send it to cloud
3. Evaluate F() homomorphically
2. Search F() 5. Compute Dec (Enc( F(m))) =F(m), and recovers F(m)
4. Returns Enc(F(m)) Data Owner
Cloud
Data User
Fig. 6 A simple data owner, data user and cloud HE scenario
Homomorphic Encryption (FHE) [97–100] allows unlimited number of operations with unlimited number of times. All the above categories are summarized in the Table 7. The best part of PHE is that mathematical operation does not require the encrypted values to be decrypted or reveal during the entire process. The data remains confidential to the database operator or cloud service providers, who are operating on it and even if anyone come to know about the data then the compromised data gives only encrypted values. PHE techniques can achieve either addition or multiplication but not both. SWHE and FHE techniques are still not efficient in terms of memory usage, computation cost and costly bootstrapping operation [97]. One more significant limitation of FHE schemes is its memory requirement; ciphertext and public key sizes are very large in order to guarantee sufficient security to prevent against possible attacks. One more limitation of FHE technique is bootstrapping operation is very costly and it is required to reduce the noise that is incurred at the time of homomorphic operation performed on ciphertext. In our proposed method we used PHE Schemes i.e. Paillier [92]. Following three phases are involved in it.
For every message p, the number t is randomly chosen and the encryption works as follows: c = E( p) = f p t d (mod d 2 ) c. Decryption process For a proper ciphertext c < n 2 , decryption is
D(c) =
H (cλ (mod d 2 )) mod d = p, H ( f λ (mod d 2 ))
(2)
where private key pair is (a, b).
Homomorphic Property is E( p1 ) ∗ E( p2 ) = ( f p1 t1d (mod d 2 )) ∗( f p2 t2d (mod d 2 )) = f p1+ p2 (t1 ∗ t2)d (mod d 2 ) = E( p1 + p2).
a. Zetta server It is responsible for computing (Public, Private) Keys pair: First it chose large primes a and b such that gcd (ab, (a − 1)(b − 1)) = 1, than it compute d = ab and λ = lcm (a − 1, b − 1). Then, it selects a random integer f X d∗2 by inspecting whether gcd (d, H ( f λ mod d 2 )) = 1, where the function H is defined as H (u) = (u − 1)/d for every u from the subgroup X d∗2 which is multiplicative subgroup of integers modulo d 2 instead of d. Lastly, public key is (d, f ) and the c is (a, b) pair. b. Encryption process Data, which is collected from ADS-B Receiver/ flightradar24 are encrypted.
(1)
(3)
Above show that Paillier’s partial homomorphic encryption scheme is homomorphic over addition. Along with this it allows extra basic operation on plaintext p1, p2 ∈ xd∗2 by using the encrypted plaintext E( p1), E( p2) and public key pair (d, f ): E( p1) ∗ E( p2) (mod d 2 ) = E( p1 + p2 (mod d)) E( p1) ∗ f
p2
(mod d ) = E( p1 + p2(mod d)) 2
E( p) (mod d ) = E(kp (mod d)) k
2
(4) (5) (6)
123
S. V. Limkar, R. K. Jha Table 7 Summary of homomorphic encryption schemes S. no.
Homomorphic encryption schemes
Evaluate function
Hardness of the scheme
Homomorphic property
1
PHE [86–92]
Permits only one type of operation with an unlimited number of times.
RSA [86] is based on the factoring problem of the product of two large prime numbers [101]
Only multiplication
Goldwasser-Micali [87] is based on the quadratic residuosity problem [101]
Only addition
Elgamal [88]is based on the certain problems in discrete logarithm [101]
Only multiplication
Benaloh [89] is based on the higher residuosity problem [101]
Only addition
Paillier [92] is based on the composite residuosity problem [101]
Addition and some extra property such as scalar multiplication
Young [93] is based on garbled circuit Sander [94] is based on NC1 circuit
Arbitrary Polynomially many AND and one OR/NOT gate.
Boneh [95] is based on the subgroup decision problem.
Unlimited addition and only one multiplication
Ishai [96] is based on the branching programs
Arbitrary
Gentry [97] is based on the Ideal Lattice. Van Dijk [98] is based on the Approximate- Greatest Common Divisor (AGCD) problems [102]
Unlimited number of addition and multiplication.
2
3
SWHE [93–96]
Permits some types of operations with a limited number of times.
FHE [97–100]
Permits unlimited number of operations with unlimited number of times.
Brakerski and Vaikuntanathan [99] is based on learning with error.
Table 8 Sample extracted data of single flight position Aircraft ID
SG512
Aircraft Source airport
Aircraft destination airport
Aircraft current position
Latitude
Longitude
Latitude
Longitude
Latitude
Longitude
Altitude
17.254
78.431
23.07724
72.634651
22.39902
74.91647
3600
22.36407
75.0325
3600
22.33255
75.13809
3600
22.29863
75.2504
3600
22. 29765
75.2104
3600
22. 29548
75.2711
3600
22. 28713
75.2394
3600
Above Eqs. (4), (5), (6) shows, in what way the operation computed on encrypted data affects the plaintexts. Applying the encryption to the Table 8. Resulting encrypted values are shown in the “Appendix” section.
5.2 Proposed work The proposed work is divided into four stages i.e. input dataset, data partitioning, parallel architecture and lastly parallel tree construction which is shown in Fig. 7.
123
5.2.1 Input dataset We consider a set of moving aircrafts in a three dimensional space. The rough goal is to make knowledge about their real time current locations and keep the information up-to-date. The aircrafts are uniquely identified through a field called aircraft ID and their positions are represented in 3-dimensional space, i.e. in the form of a coordinates Latitude, Longitude, Altitude represented in the form of x, y, z coordinates. The data received from the moving aircraft are encrypted using partial homomorphic encryption scheme and this encrypted data is indexed into a 3-dimensional R tree and its variant’s
Computing over encrypted spatial data generated by IoT Fig. 7 Proposed work
1 Input
2 Data Paroning
3 Parallel Architecture Used
4 Tree Construcon
R Tree
Chunk-1/ RDD-1
Chunk-2/ RDD- 2
R+ Tree
( On Disk)
Encrypted Spaal Data Chunk-3/ RDD- 3
R* Tree
( In Memory)
Chunk-4/ RDD- 4
i.e. R+ tree and R* tree. Indexing is made in order to get the efficient results of spatial queries over encrypted spatial data and real time updates of aircrafts positions. ADS-B transponder broadcast data twice in a second. The data which we receive contains lots of information, but we are fetching and feeding only data related to position updates, i.e. Latitude, Longitude and Altitude with respect to each aircraft which is uniquely identified by its unique ID called as aircraft ID. Figure 8 shows a flowchart of fetching aircraft data from ADS-B receiver which is used in the first simulation scenario. Following Fig. 9 shows a flowchart for fetching aircraft data from flightradar24.com which is used in second simulation Scenario.
Flightradar24 dataset [56] In Fig. 10, blue colored, highlighted text depicts its flight ID, dark red colored depicts aircraft destination latitude and Longitude, purple colored depicts aircraft source latitude and Longitude, green colored depicts trail, which give the exact real time position of the aircraft and it gets updated twice in a second. As we receive enormous data from flightradar24. This whole data is not required for our proposed work, so we fetched only required data which is shown in Table 8, the last three columns are continuously updating for twice in every second. Received geodetic latitude (x), ¯ longitude ( y¯ ) and altitude (¯z ) which is of 2-by-1 vector need to be converted into 3-by-1 vector of Earth-centered, Earth-fixed (ECEF) position (a) ¯ 3by-1 vector [103–105]. The ECEF position is calculated from the geocentric latitude at mean sea-level (γs ) and longitude using:
Start
Detects ADS-B signal
Zea Driver decode the received ADS-B packet
If packet type is airbone msg
If packet type is idenficaon
N
Y
N
Y
Retrieve Latude, Longitude, Altude
Retrieve the flight source & desnaon informaon
Convert this info into JSON format
Apply Paillier’s Paral Homomorphic Encrypon
Upload above Encrypted data to Zea Server
Stop
Fig. 8 Fetching aircraft data from ADS-B receiver
123
S. V. Limkar, R. K. Jha
as equatorial radius, with following equations.
Start
γs = a tan((1 − p)2 tan x) E2 ks = 1 + (1/(1 − p)2 − 1)sin2 (γs )
Collect the data from flightradar24.com through web streaming API
(8) (9)
5.2.2 Data partition Parse received data into flight ID, Latude, Longitude, altude, source, desnaon
Convert data into JSON
Apply Paillier’s Paral Homomorphic Encrypon
Upload above Encrypted data to Zea Server
Sleep for 100 ms
Fig. 9 Fetching aircraft data from flightradar24.com through web streaming
⎤ ⎤ ⎡ ks cos γs cos y + z cos x cos y al a¯ = ⎣ am ⎦ = ⎣ ks cos γs sin y + z cos x sin y ⎦ an ks sin γs + z sin x ⎡
(7)
Flattening ( p) ¯ is used to define the geocentric latitude from ¯ mean sea-level and (ks ) as the radius at a surface point, ( E)
Various geospatial data partitioning techniques exist [106– 115]. In order to efficiently process large volume of geospatial data on multiple machines, we need to divide this dataset into small equal pieces and each individual piece need to be processed on individual machines. The performance of parallel computing is totally based on the quality of partition. To achieve good load balancing there is a highly necessity of implementing good partition strategy. Basically, there are two main types of partitioning strategy exist, i.e. space oriented and data oriented. In space oriented partitioning techniques generates sub partitions by spatially decomposing the current partition boundary into two equal subspaces. Decision to partition the data is purely based on the space, which suffers from data skews. Data oriented partitioning technique generates sub partitions by finding a cut such that each resulting sub partition contains roughly the same amounts of data. In our scenario we used second approach, i.e. data oriented partition. The following algorithm is used to extract the aircraft’s position dataset and partition it into a number of small parts so that we can build the index of R trees in a parallel way. Two inputs are required for this, first one is the JSON Data and the second one is the number, which specifies how many parts we need to divide this dataset. Output is the main JSON dataset partitioned into a number of parts. This partitioning algorithm is common in both the parallel framework i.e.Hadoop and spark data partitioning.
{ flight: "SG512",snapshot_id: null,status: "airborne",dep_schd: 1461416100,arr_schd: 1461422100,departure: 1461416572, arrival: 1461421824, eta: 1461421824, to_iata: "AMD", to_city: "Ahmedabad, Ahmedabad International Airport",to_pos: [23.07724,72.634651], to_tz_code: "IST", to_tz_offset: "5.50", to_tz_name: "India Standard Time",from_iata: "HYD",from_city: "Hyderabad, Hyderabad Rajiv Gandhi International Airport", from_pos: [17.254,78.431], from_tz_code: "IST", from_tz_offset: "5.50",from_tz_name: "India Standard Time", airline: "SpiceJet",aircraft: "Boeing 737-7GL",airline_url: "http://www.flightradar24.com/ data/airplanes/sej-sej",image: "http://d31asmy75eposw.cloudfront.net/200/5/84101_1450424561_tb.jpg?v=0",imagelink: "http:// external.flightradar24.com/redir.php?url=http%3A%2F%2Fwww.jetphotos.net%2Fphoto%2F8159824",copyright: "BHASKAR GURUPRASAD",imagesource: "Jetphotos.net",image_large: "http://doygv6stby4r3.cloudfront.net/ image.php?params=U0ZHNjBaQzg0TnlvQWJLd3htODRqTldQWXFyWjYxM0M4T1dFamIrUmNJYnE1SlIxejY3NWRrUnAwSENoc DhUdWMvT2V1M1hVTmZ2OFlXSHNIdDVsN3hySFFLWDBGOURxN2FSdFU1VC8yeDdEaEVzdmpzRVdUaVFsYjQzRiszbDQ=", imagelink_large: "http://external.flightradar24.com/ redir.php?url=http%3A%2F%2Fwww.jetphotos.net%2Fphoto%2F8159824",copyright_large: "BHASKAR GURUPRASAD", q: "98086a0", trail: [22.39902,74.91647,3600, 22.36407, 75.0325,3600,22.33255,75.13809,3600,22.29863,75.2504,3600],first_timestamp: 0 } Fig. 10 A piece of JSON data received from Flightradar24.com for single flight position update
123
Computing over encrypted spatial data generated by IoT
Input: JSON Data, number of parts Output: array 1. 2. 3. 4. 5. 6.
Flight[] = fetch the flight records from raw JSON data. Count= Flight.Length/no of parts FlightArray[]= empty for (i=0;i < flight.length;i++) FlightArray[i % no of parts].add(flight[i]) Return flightArray;
5.2.3 Parallel architecture We worked on three different parallel architecture, i.e. Apache MapReduce, Apache Spark on disk computation and Apache Spark in-memory computation along with traditional single core architecture i.e. sequential architecture, these are explained as follows. (i) MapReduce framework [116] MapReduce is developed at google. The main intention behind the evolution of MapReduce framework is to support parallel processing across a large amount of data on clusters of commodity hardware [116]. With this framework, user can concentrate on computation part only, without worrying about parallelization. In this framework whole task is broken into many small parts of two types i.e. map and reduce. The mapper input is in the form of (key, value) pair, and generates intermediates (key, value) pair. Then, this framework combines the values with same intermediate keys. The reducer takes input as the same intermediate key, along with all the intermediate values which belongs to that key and generates a list of output values. Master of this framework manages all the process such as it activates the input file partitioning, assign mappers and reducer, coordinates the communication between mappers are reducers. Open source implementation of this framework is called as Hadoop [117]. Hadoop is the combination of MapReduce framework and distributes file system (HDFS). (ii) Apache Spark [118] MapReduce [116] is limited to batch processing, storm [119] is limited to real time stream processing, Impala [120] is limited to interactive processing, Giraph [121] is limited to graph processing. Every solution we need a specialized system which is very costly i.e. it requires various types of resources and team of developers, also it is very tough to manage. So there is a need to have a general purpose system which can handle all the requirements such as; powerful engine for processing data in real time and perform in memory analytics. Apache Spark is the powerful open source engine that provides real time, interactive, graph, in-memory as well as batch processing. The
main component of Apache Spark are spark context, cluster manager, worker node which is explained in details as follows. Spark context It is defined in the driver program. It is the main entry point of the spark application. It interacts with the cluster manager. It specifies how to access the cluster using this we can control the spark applications. RDDs are created using spark context only. Cluster manager Spark context interacts with the cluster manager and get the resources. Cluster manager keeps track of all the resources like, how many cores of CPUs, how much memory, how many executors are available in the overall cluster. Worker node Inside worker node executors is running. Executors are running inside the Java Virtual Machine (JVM) and there will be multiple task running inside the executors. In the diagram more than one task is running inside each executor. Task are the fundamental units of execution. Cache is the important property of the spark. It can cache the data, i.e. keeps the data in-memory. Cache is also used for inmemory computation. The complete architecture of Apache Spark is shown in Fig. 11. Difference between Apache Spark and MapReduce framework is shown in Table 9. Difference between traditional stream processing and Apache Spark stream processing gives clear idea of the Apache Spark which is given in Table 10.
5.2.4 Tree construction First we constructed R tree and its variants over encrypted geospatial data. Construction of minimum bounding rectangles i.e. MBR’s are not proper as the size of the ciphertext is too large. If we still want to construct the MBR’s then the size of the ciphertext gets truncated and resulting MBR’s not getting at appropriate size. So to overcome this, we decrypted only latitide, longitude and altitude values (we have not decrypted aircraft ID) for the merging of two R tree and its Variants. We proposed a novel algorithm for merging of two R tree and its variants, which works on reducer of both the frameworks i.e. MapReduce and Apache Spark, by using a plane sweep algorithm [122]. Consider there are two different R trees such as R_Tree and S_Tree. R_Tree MBRs are a1 , a2 , a3 , a4 where as S_Tree MBRs are a5 , a6 , a7 , a8 , a9 . In order to combine these two trees we need to find out the pairs of rectangles who are intersecting with each other. Figure 12 shows the Set P of the rectangle (MBR’s of R_Tree and S_Tree) which are aligned on the X-Axis. A plane sweep algorithm is used to get the pair of intersected rectangle.
123
S. V. Limkar, R. K. Jha
Cluster Manager
Spark Context
Worker Node
Worker Node
Executer
Cache
Executer
Cache
Task
Task
Task
Task
Datanode
Driver Program
Datanode
HDFS
Fig. 11 Architecture of Apache Spark
Input: Set P of the rectangle which are aligned on the X-Axis Output: Pair of intersected rectangles For every pair (ai , a j ) of rectangles S, i = j If (ai ∩ a j = Φ) then report (ai , a j ) Output: (a1 , a2 ) (a1 , a3 ) (a1 , a8 ) (a3 , a4 ) (a3 , a5 ) (a3 , a9 ) (a4 , a5 ) (a7 , a8 ) Here the output is in two groups, first group is the pair of rectangles who are intersecting with each other and second group is the rectangles who are not intersecting with another. Once we find the pair of rectangle, then we combine those rectangles to form a larger MBR, if following condition get satisfied. m≤n≤M where n is the number of aircrafts. If the above condition is not satisfied, then check whether n > M, if yes, then call either condense or split algorithm of R tree and its variants. Then build R tree and its variants using a bottom up approach, while merging MBR’s check condition m <= M at each level of the tree. Insert unpaired rectangles MBR’s using a standard insertion algorithm of R trees and its variants [57–59] Hence, in this way we will get final R tree and Its Variants.
123
Pseudocode Step 1 Receive encrypted data and partition it into equal size of a chunk’s and RDD’s and store it into HDFS. Step 2 If encrypted data is partitioned into chunks, then call MapReduce else call Apache Spark in-memory or Apache Spark On-Disk computation. Step 3 Workers i.e. mappers are assigned to each chunk’s and RDD’s. Step 4 Each mapper first decrypt the received encrypted data (only Lat, Long and Alt). After decrypting each mapper constructs R tree and its variants from received data using an existing standard algorithm [57–59]. Step 5 Workers i.e. reducers are assigned to merge the trees. Step 6 Find a pair of leaf nodes MBR’s, who are intersecting with other leaf nodes MBR’s using plane sweep algorithm. Step 7 For set of paired rectangles who satisfies the condition m ≤ n ≤ M, merge these rectangles and form a new MBR. Step 8 If condition in earlier step is not satisfied, then check n > M, if this condition satisfies then call condense / split algorithm of R tree and its variants. Step 9 Build R tree and its variants using a bottom up approach, while merging MBR’s check condition m <= M at each level of the tree. Step 10 Insert all unpaired MBR’s one by one using a standard insertion algorithm of R trees and its variants.
Computing over encrypted spatial data generated by IoT Table 9 Difference between Apache map reduce and Apache Spark S. no.
Characteristics
Apache Spark
MapReduce
1
Speed
It is superfast cluster computing tool. It executes applications up to 100× faster in memory and 10× faster on disk compared to Hadoop MapReduce. This is achieved by minimizing the number of read/write cycle to disk and intermediate data is stored in memory.
It does not leverage maximum utilization of memory of the Hadoop cluster, that’s why it reads and writes the data from disk leading to decrease the processing speed
2
Difficulty
It is easy to program because it has lots of high level operators with Resilient Distributed Dataset (RDD), which makes it very easy to work
It needs to write a hard code for each and every single task, which makes it very difficult to work
3
Easy to manage
It can perform batch processing, interactive machine learning and streaming all inside the same cluster. No need to manage different modules as requirements arise
Designed to perform only batch processing application. So it depends on other modules such as storm, giraph, impala, etc. Resulting it into very complex system to manage multiple components
It is very easy to manage multiple components 4
Real time analysis
It can process real time data. Its asset lies into efficiently processing live streams as it arrives in real time with zero or no latency
It fails to process real time data, as it was intended to accomplish batch processing on big data
5
Latency
It provides almost zero latency computing framework. As it involves in memory computation
It is a very high latency computing framework. As it involves on disk computation
6
Interactive mode
Interactive data can be processed
It does not support interactive mode
7
Storage used
Memory/disk
Disk
8
Streaming
Spark streaming is used to process real time data
It can only support batch processing mode
9
Ease of use
It is very simple and easy to use; its simple abstraction, makes user to process data using high level APIs
It is a complex and difficult to use; makes user to process data using low level APIs. Which necessitates lots of hand coding for each and every task
10
Replication
It does not implement replication
It implement replication, default factor is 3
11
Recovery
It allows recovery of partitions on failed nodes by re-computation of the DAG. It also support check pointing style of Hadoop MapReduce
It is resilient to system faults and failures, resulting in to very high fault tolerant system
12
Queries handled
It can handle both real time as well as historical queries
It can handle only historical queries
13
Ease
It is very easy to program because it does not require any abstractions
It is very difficult to program because it requires abstractions
14
Caching
It caches its partial or intermediate results across its memory of distributed workers
It caches its partial or intermediate results across its disk of distributed workers
15
Security
It is still in its infancy
It has more security features
16
Data modification
It allows user to modify the data in real time through streaming
It allows user to process batch of stored data.
Table 10 Traditional stream versus spark streaming processing
S. no. Characteristics
Tradition stream processing
Spark stream processing
1
Processing
One record at a time
Discretizes data into small chunks
2
Load Balancing
Static
Dynamic
3
Failure recovery
Slower
Faster
4
Unification of streaming, batch and interactive workloads
Not possible
Possible
5
Advanced analytics
Not possible
Possible
6
Latency
High
Low
7
Performance
Low throughput
High throughput
123
S. V. Limkar, R. K. Jha
(b) Parallel tree construction on Apache Spark on disk computation
a6 a7
It consist of eight steps that are as follows.
a5 a8
a4
a3 a1
a2
a9
Fig. 12 Finding intersected rectangles i.e. MBRs of two tree
We used three different parallel architectures for parallel construction of R tree and its variants. Flow of the proposed pseudocode is given in Fig. 13. Detailed explanation of each architecture is as follows. (a) Parallel tree construction on MapReduce framework It consist of eight steps that are as follows. i. First all encrypted data is stored on to the Hadoop distributed file system (HDFS). ii. Whole encrypted data is partitioned into equal size of chunks using data oriented partitioning technique. iii. Workers i.e. mappers are assigned to each and every chunk. iv. Each mapper first decrypt the received encrypted data (only Lat, Long and Alt). After decrypting each mapper constructs R tree and its variants on received data using an existing algorithm given in [57–59] and stores its results into HDFS. Here mapper constructs three different tree, i.e. R tree, R+ tree and R* tree. v. Again, workers i.e. reducers are assigned. vi. In this step, each reducer will take the input from exactly two mappers and combines these two input trees generated by mappers, by using our tree merging algorithm. Results are stored in HDFS. If reducer gets two R tree constructed by two mappers then it reduces into one R tree. Same for R+ tree and R* tree. vii. If still we don’t get the final tree then earlier steps i.e. V followed by VI is repeated, till we get final tree. viii. Final tree is stored into HDFS. The complete flow of the proposed algorithm for the merging of two R tree and its variants using MapReduce is shown in Fig. 14.
123
i. First all encrypted data is stored on to the Hadoop distributed file system (HDFS). ii. Whole encrypted data is partitioned into a series of very small, deterministic batches of equal size called as RDD, using data oriented partitioning technique. Each batch of data is treated as RDDs in spark and processes them using RDD operations. iii. Workers i.e. mappers are assigned to each RDDs. iv. Each mapper first decrypt the received encrypted data (only Lat, Long and Alt). After decrypting each mapper constructs R tree and its variants on received data using an existing algorithm given in [57–59] and stores its results into HDFS. Here mapper constructs three different tree, i.e. R tree, R+ tree and R* tree. v. Again, workers i.e. reducers are assigned. vi. In this step, each reducer will take the input from exactly two mappers and combines these two input trees generated by mappers, by using our tree merging algorithm. Results are stored in HDFS. If reducer gets two R tree constructed by two mappers then it reduces into one R tree. Same for R+ tree and R* tree. vii. If still we don’t get the final tree, then earlier steps i.e. V followed by VI is repeated, till we get final tree. viii. Final tree is stored into HDFS. The complete flow of the proposed algorithm for the merging of two R tree and its variants using Apache Spark on disk computation is similar to Fig. 15 except data partitioning steps i.e data is partitioned in to RDD’s. In Apache Spark on-disk computation intermediate and final results are stored onto distributed shared memory which is slower then Random Access Memory (RAM) and faster than Disk. (c) Parallel tree construction on Apache Spark on disk computation It consist of eight steps that are as follows. i. First all encrypted data is stored on to the Hadoop distributed file system (HDFS). ii. Whole data is partitioned into a series of very small, deterministic batches of equal size called as RDD, using data oriented partitioning technique. Each batch of data is treated as RDDs in spark and processes them using RDD operations. iii. Workers i.e. mappers are assigned to each and every RDDs.
Computing over encrypted spatial data generated by IoT
Start Received data partitioned into equal size of chunk’s and RDD’s If data is partitioned into chunks then call MapReduce else call Apache Spark in-memory and Apache Spark On-Disk computation Workers i.e mappers are assigned to each chunk’s and RDD’s Each mapper construct R Tree and its variants after decrypting ( Only latitude, longitude, altitude) the received encrypted data using existing standard algorithm Workers i.e reducers are assigned to merge the trees Find pair of leaf nodes MBR’s, who are intersecting with each other using plane sweep algorithm
Paired rectangles
N
n>M Y Call Condense / split algorithm of R Tree and its variants
Unpaired rectangles
Validate m<=n<=M Y Merge and form a new MBR
Build R Tree and its variants using bottom up approach, while merging MBR’s check condition m<=M at each level of tree N
K>=0 Y
Insert K unpaired MBR’s using standard insertion algorithm of R Trees and its variants. K--
Final R Tree and Its Variants
End Fig. 13 Flow of proposed work
123
S. V. Limkar, R. K. Jha
Input Data 0 00 0 0 00 0 0 00 0 00 0 0 00 0 0 00 0 00 0 0 00 0 0 00 0 00 0 0 00 0 0 00 0 00 0 0 00 0 0 00 0 00 0 0 00 0 0 00 0 0 00 0 00 0 0 00 0 0 00 0 00 0 0 00 0 0 00 0 00 0 0 00 0 0 00 0 00 0 0 00 0 0 00 0 00 0 0 00 0 0 00 0 0 00 0 00 0 0 00 0 0 00 0 00 0 0 00 0 0 00 0 00 0 0 00 0 0 00 0 00 0 0 00 0 0 00 0 00 0
1
Hadoop Distributed File System (HDFS) 2
Chunk-1
Chunk-2
0 00 0 0 00 0 0 0 0 00 0 0 00 0 0 0 0 00 0 0 00 0 0 0
Chunk-3
0 00 0 0 00 0 0 0 0 00 0 0 00 0 0 0 0 00 0 0 00 0 0 0
Mapper-1
Chunk-4
0 00 0 0 00 0 0 0 0 00 0 0 00 0 0 0 0 00 0 0 00 0 0 0
Mapper-2
Mapper-3
0 00 0 0 00 0 0 0 0 00 0 0 00 0 0 0 0 00 0 0 00 0 0 0
Workers (Mappers) are assigned
4
Each mapper generates R tree and its variant's and stores its result in to HDFS
5
Workers (Reducers) are assigned
6
Each reducer merges R tree and its variant's and stores its result in to HDFS
Mapper-4
Reducer-2
Received Data is paroned in to number of equal size called as Chunks
3
Distributed File System (HDFS)
Reducer-1
Encrypted Input Data is passed to HDFS
Distributed File System (HDFS)
Reducer-3
7
Workers (Reducers) are assigned
Distributed File System (HDFS)
8
Fig. 14 Flow of algorithm for the merging of two R tree and its variants using MapReduce
123
Each reducer merges R tree and its variant's and stores its result in to HDFS
Computing over encrypted spatial data generated by IoT
Distributed File System (HDFS) 0 00 0 0 00 0 0 00 0 00 0 0 00 0 0 00 0 00 0 0 00 0 0 00 0 00 0 0 00 0 0 00 0 00 0 0 00 0 0 00 0 00 0 0 00 0 0 00 0 0 00 0 00 0 0 00 0 0 00 0 00 0 0 00 0 0 00 0 00 0 0 00 0 0 00 0 00 0 0 00 0 0 00 0 00 0
1
0 00 0 0 00 0 0 00 0 00 0 0 00 0 0 00 0 00 0 0 00 0 0 00 0 00 0 0 00 0 0 00 0 00 0 0 00 0 0 00 0 00 0
RDD-1
RDD-2
0 00 0 0 00 0 0 0 0 00 0 0 00 0 0 0 0 00 0 0 00 0 0 0
RDD-3
0 00 0 0 00 0 0 0 0 00 0 0 00 0 0 0 0 00 0 0 00 0 0 0
Mapper-1
RDD-4
0 00 0 0 00 0 0 0 0 00 0 0 00 0 0 0 0 00 0 0 00 0 0 0
Mapper-2
Mapper-3
0 00 0 0 00 0 0 0 0 00 0 0 00 0 0 0 0 00 0 0 00 0 0 0
2
Workers (Mappers) are assigned
4
Each mapper generates R tree and its variant's and stores its result in to memory
5
Workers (Reducers) are assigned
6
Each reducer merges R tree and its variant's and stores its result in to memory
Mapper-4
Reducer-2
Persistent Memory
Reducer-3
Received Data is paroned in to number of equal size called as RDD
3
Persistent Memory
Reducer-1
Encrypted Data is passed from HDFS
7
Persistent Memory 8
Workers (Reducers) are assigned
Each reducer merges R tree and its variant's and stores its result in to memory
Fig. 15 Flow of algorithm for the merging of two R trees and its variants using Apache Spark in-memory computation
123
S. V. Limkar, R. K. Jha
iv. Each mapper first decrypt the received encrypted data (only Lat, Long and Alt). After decrypting, each mapper constructs R tree and its variants on received data using an existing algorithm given in [57–59] and stores its results into memory. Here mapper construct three different tree i.e. R tree, R+ tree and R* tree. v. Again, workers i.e. reducers are assigned. vi. In this step, each reducer will take the input from exactly two mappers and combines these two input trees generated by mappers, by using our tree merging algorithm. Results are stored in HDFS. If reducer gets two R tree constructed by two mappers then it reduces in to one R tree. Same for R+ tree and R* tree. vii. If still we don’t get the final tree, then earlier steps i.e. V followed by VI is repeated, till we get final tree. viii. Final tree is stored inside the memory. The complete flow of the proposed algorithm for the merging of two R tree and its variants using Apache Spark in-memory computation is shown in Fig. 15.
6 Experimental setup In this section we discuss the experimental setup which comprises the details of the hardware specifications, Dataset, workload. Afterward, we present the results and discussions.
6.1 Hardware specifications We conducted our experiments on Dell PowerEdge T430 Tower Server 1 x Intel Xeon E5-2609v4 (1.7GHz/8C/20MB/ 85W/1866Mhz), 8 x 8GB DDR4 RAM, 64 bit SUSE Linux Enterprise 12 SP2 as a base operating system.
All results are based on this dataset.
6.3 Workload (1) MapReduce Framework (2) Apache Spark Framework
6.4 Cryptographic setting Data owner i.e. ADS-B receiver is having inbuilt shared public key of 512-bit through which it encrypts the received data and transmit it through unsecured channel to IoTZetta server whereas data user i.e. end user who uses this data is working on standard existing security i.e. HTTPS.
6.5 Software required 1. SUSE Linux Enterprise 12 SP2 as a base operating system. 2. VMware Workstation 12.5.6 Player for Linux 64-bit Operating Systems [124] 3. Cloudera CDH 5.8 [125]: It is the fastest, easiest, and highly secured Hadoop platform. It is an end to end application to manage and deploy Hadoop clusters. It comes with a stack of services such as Hbase, Hive, Sqoop, etc., inside its clusters. It uses REST APIs. It allows us deploy and manages Hadoop clusters and it services programmatically. The main advantage is that it saves a considerable amount of time and efforts if we have to do the same things like deploying a cluster. A cluster consist of three nodes, i.e. one master and two slaves. The master node runs most of the services and therefore requires higher memory. It is shown in Fig. 16.
6.2 Dataset We used two different sources of data sets. 6.2.1 ADS-B receiver [123] We used GNS 5890 ADS-B Receiver USB-Stick which works on ADS-B 1090 MHz. It is capable to capture the real-time aircraft’s position within its range of around 300 Km, but it can capture only few aircraft locations. Since the number of aircraft captured by these receivers are very few as compared to flightradar24. 6.2.2 Flightradar24 [56] It can capture thousands of aircraft’s real-time position, each aircrafts location is updated twice in every second. We considered this dataset.
123
Hadoop 1 (Master) Name Node Task Tracker Cloudera Manager 3-CPU, 16 GB RAM IP Address: 192.168.1.1
Hadoop 2 (Slave) Secondary Name Node Data Node Task Node 1-CPU, 8 GB RAM IP Address: 192.168.1.2
Hadoop 3 (Slave) Data Node Task Node 1-CPU, 8 GB RAM IP Address: 192.168.1.3
Hadoop 4 (Slave) Data Node Task Node 1-CPU, 8 GB RAM IP Address: 192.168.1.4
Fig. 16 Cloudera cluster
Computing over encrypted spatial data generated by IoT
7 Threat model
8 Security analysis
In this paper we focused on IoT application, where data is collected from sensor and stored on third party cloud. We considered three parties involved in it: Cloud, Data owner, Data user and an Identity Provider (IDP) which is present inside the zetta server to certify the public key of each user.
8.1 Security analysis and attack model Definition 1 if there exist a plain text Pt and cipher text Ct. Cryptosystem said to be perfectly secrete only if, for any g ∈ Pt and any h ∈ Ct, PR (pt = pt | ct = Ct) = PR (pt = pt).
7.1 Threats We considered the cloud is honest but at the same time it is curious i.e. it will follow the rules correctly, along with this it tries to learn from the stored data. This is a standard model where violation of rules results in to penalizing the service provider. However, the internal adversary always zealous to passively analyze the data without being detected. Few chances cannot be ignored, where external adversary can also gain access to the encrypted data if system security gets compromised. We consider cloud-side threats which are due to curious administrator and threats while data is transferred from edge devices to cloud through unsecured communication media.
7.2 Assumptions Along with the presence of the honest but curious cloud, our proposed system assumes that the Identity Provider (IDP) properly validates users’ or devices identity key pairs. The IDP is the basic requirement in multi-party systems, which is also a trustworthy for both the involved parties i.e. external and an internal party. Moreover, proposed system assumes that the system work properly and never ever share user keys to malicious parties or third parties. Finally, we assume standard state of the art security mechanisms is present for IoT device security and communication amongst all the party is over secure channel.
7.3 Guarantees Our proposed system guards the privacy of IoT device generated data, which is stored in the cloud. However, if IoT device gets compromised and group keys are disclosed, only data of the compromised device and the affected group are disclosed. Proposed system is also secure against, Eavesdropping, Man in the middle attack (MitMA), Data interception and theft, because the data which is transferred from edge devices to cloud is in the form of cipher text. Even if this data gets available to the adversary, it can’t be decrypted without the proper keys. Our proposed system cryptographically avoids unauthorized data access
Attack model Known ciphertext only attacks is considered for analyzing the security of our proposed scheme. In this model, the IoTZetta cloud server is only holding encrypted dataset, which is outsourced bye the data owner. In this type of attacks, attacker has the cipher text of several messages, all of which have been encrypted with the encryption algorithm. Attacker’s job is to recover the plain text of as many messages as possible or better yet to deduce the key used to encrypt the messages, in order to decrypt other messages encrypted with the same keys. If we received ciphertext value = 0, then plaintext pt = 0. Because of this, all the times pt = 0 is encrypted as non-zero element in X d n . Theorem 1 The proposed privacy model maintains perfect secrecy of the IoT generated data. Proof Let {γ1 , . . . . . . γn } be a basis of X d n over X d . For any cipher text ct ∈ X d∗n , there exist j ∈ {1 . . . n} such as T1n (γ j ct) = 0. For any plain text pt ∈ X d and any a j ∈ X d , i ∈ {1..n}\{ j}, define a j = pt −
i∈{1...n}{ j}
ai T1n (γi ct) (T1n (γ j ct))−1
n n n n Then, we have i=1
an i T1 (γi ct) = pt,n i.e., T1 ( i=1 ai γi ct) = pt. Define β = i=1 ai γi , then T1 (βct) = pt. For pt = X d∗ , there are d n−1 possible β ∈ X d∗n such that T1n (βct) = pt. If pt = 0 and ai = 0 for i ∈ {1 . . . n}\{ j}, then a j = 0, which leads to β = 0. So, for pt=0, there are only d n−1 − 1 possible β Fd∗n such that T1n (βct) = m. Note: We assumed pt = 0 is encrypted as a nonzero element in X d n . Hence, for any pt ∈ X d and ct ∈ X d∗n . As root ct ∈ X d∗n is arbitrarily selected, then Ar(ct = ct|pt = pt) =
d n−1 1 . 1 = d n−1 , if pt ∈ X d∗ , d n−1 d n−1 n−1 d −1 1 . 1 = d n−1 , if pt − 0. d n−1 d n−1 −1
As for any m ∈ X d , PR(pt = pt) = 1/d, for any cipher text ct ∈ X d∗n , PR(ct = ct) = Ar(pt = pt) PR ct = ct|pt = pt) pt∈X d
1 = n . d −1
123
S. V. Limkar, R. K. Jha
With the help of Bayes’ theorem, for any pt ∈ X d and any ct ∈ X d∗n , PR(pt = pt|ct = ct) PR(ct = ct|pt = pt) PR(pt = pt) 1 = = = PR(pt = pt) PR(ct = ct) d Therefore, from the definition of partial homomorphic encryption, our proposed technique maintains perfect secrecy. Where, P R = Conditional Probability X d = finite field, d = Power of a prime, T1n (.) = Trace function required for a linear transformation from X d n onto X d .
8.2 Load balancing The performance of parallel computing is totally based on the quality of partition. If the data is not partitioned properly, load imbalance arises. To achieve good load balancing there is a highly necessity of implementing good partition strategy. In the proposed work, the data oriented partitioning method is used to decide equal partition cut and then with the proposition 1 balancing of load can be achieved. Proposition 1 If the received geospatial data A is suitably huge, suitable amount of geospatial data will be allotted to each mapper once the data is partitioned. Proof “A is suitably huge” it means the data which we received after partitioning is adequate for consideration of parallel distribution across the different mapper. The partitioning algorithm partition the received geospatial data into M mappers. Let Mi be i-th mapper. Then M denotes the number of partitioned geospatial data in the mapper and can be shown as follows: A |Mi | = M
(10)
If case where geospatial data distribution is defined properly by the generated geospatial dataset D. M represents the total number of geospatial dataset, can be represented as follows: ||Mi || = |Mi |
|D| |D| = A M
(11)
8.3 Speedup The important reason behind using parallelization of a job is to run the job faster i.e. to speedup. Speedup in parallel
123
program can be assessed with the support of Amdahl law [126]. Next we examine the speedup of the algorithm. The speedup can be represented in terms of following equation as follows: Speedup (S) =
Ts Tp
(12)
where Ts denotes the total time required for sequential execution of the job, Tp denotes the total time required for execution of parallel jobs. In our algorithm, the sequential execution time Ts can be represented as follows: Ts = TPart + TEncr
(13)
Where TPart represents the time required for partitioning the received data and TEncr represents the time requires for encryption of the received data. The parallel execution time Tp can be represented as: Tp =
Tmerg Ttree Tintr Ttree Tdecr + + + + + Tover M M r r r
(14)
where Tdecr is the time required for decrypting only latitude, longitude and altitude, Ttree is the time required to construct R tree and its variants, Tintr represents the time required for finding the pair of intersected rectangles, Tmerg is the time required for merging the two rectangles to form a larger rectangle. Tover is the overhead caused by parallel programing which consist of the cumulative time spent on communication, and synchronization, M indicates number of mappers concurrently working on data and r indicates number of reducers concurrently working on data. The speedup can be represented by combining the Eqs. (13) and (14), as follows: S=
TPart + TEncr Tdecr M
+
Ttree M
+
Tintr r
+
Tmerg r
+
Ttree r
+ Tover
(15)
In MapReduce framework, every time the data are read from the disk and the result is stored back to the Hadoop Distributed File System (HDFS) after a specific iteration and again data gets read from the HDFS for the next iteration. This whole process involved multiple read/write operations, which consume a lot time, resulting in too high latency. Whereas in case of Apache Spark in memory computation, data is read from the memory, resulting in reduced latency compared to MapReduce framework. As we know parallel algorithm may exhibit dissimilar performance on different parallel architecture. That is applicable to MapReduce framework and Apache Spark architecture.
Computing over encrypted spatial data generated by IoT
8.4 Scalability Scalability is a measure of a parallel system’s capacity to increase speedup in proportion to the number of processors. Parallel system is said to be scalable if its efficiency is increasing by increasing the number of processor and problem size at the same time otherwise it is called as non-scalable parallel system. X=
Sequential execution time with data size D Parallel execution time with problem size D
(16)
Equation (16) is similar to the definition of the speedup which is given in Eq. (12). In speedup problem size is fixed, whereas in scalability, speed is fixed.
8.5 Efficiency The efficiency is defined as the ratio of speedup to the number of processors. Efficiency measures the fraction of time for which a processor is usefully utilized. E=
Ts S = P pT p
(17)
where S is the speedup, P is the number of processors.
8.6 Quality of generated R-tree and its variants on parallel architecture For enhancing the search performance [84] there is high necessity to have a minimal overlap and a minimal area because these factors are highly responsible for path pruning in R trees and its variants. In our case, all MBR’s are in three dimensions and area of cuboid is calculated in terms of volume. Total volume of leaf nodes is considered as the most crucial factor for the overall performance of R-tree and its variants in case of bottom up method. We can calculate the volume and overlap parameters for newly constructed R tree and its variants on parallel architecture given by following: Volume (R) =
r
Volume(Ri .M B R)
i=1 n
Volume (LN) =
i=1 r
Volume (LNi .M B R)
(18)
(19)
and its variants generated by each individual mappers (the number of children) of R, Ri is the i-th child node of R. Volume (LN) is the total area of leaf nodes, n is the number of leaf node, LN iis the i-th leaf node. All above parameters are calculated by considering spatial objects are in three dimensions, i.e. latitude, longitude and altitude. The quality attributes of constructed R tree and its variants over encrypted data and normal data on MapReduce and Apache Spark architecture by considering various numbers of partitions, number of reducers and sequential process (SP) which works on a single CPU are interpreted in Tables 11, 12, 13 and 14. Where D represents on disk computation, M = represents in memory computation and SP represents sequential processing on single CPU.
Table 11 Notations and its meaning Notations
Meaning
a, b
Secrete key pair
d, f
Public key pair
t
Block size used for the message
X d∗2
Additive subgroup of integers modulo d 2
p, Pt
Plain text
E
Encryption
Ct
Cipher Text
Xd
Finite field, where d is power of prime
T1n
Trace function
PR
Conditional Probability
x
Latitude
y
Longitude
z
Altitude
M
Mapper
A
Data recevied after partitioning
Mi
i-th mapper
r
Reducer
A
Geospatial partitioned data
D
Geospatial dataset
TPart
Time required for partitioning received data
TEncr
Time requires for encrypting received data.
Ts
Time required for sequential execution
Tp
Time required for parallel execution
Ttree
Time required to construct R tree and its variants
Tintr
Time required for finding the pair of intersected rectangles
(20)
Tmerg
Time required for merging the two rectangles to form a larger rectangle.
where R indicates the root, volume (R) represents the total area of the root node, r is the total number of sub R-trees
Tover
Overhead time caused by parallel programing
Overlap(R) =
r
Volume(Ri .M B R ∩ R j .M B R)
i=1 j=i+1
123
123
–
SP
–
19
15
11
7
3
Number of reducers
5
5
5
5
5
5
5
5
5
5
5
5
12,142
12,175
12,160
12,152
12,148
12,144
12,142
12,175
12,160
12,152
12,148
12,144
Encrypted
Normal
Normal
Encrypted
Leaf Nodes
Height of the tree
Number of mappers
4
8
12
16
20
–
Partitions
4
8
12
16
20
SP
–
19
15
11
7
3
Number of reducers
5
5
5
5
5
5
5
5
5
5
5
5
12740
12784
12768
12760
12755
12751
12740
12784
12768
12760
12755
12751
Encrypted
Normal
Normal
Encrypted
Leaf Nodes
Height of the tree
Table 13 Quality parameters of the constructed R+ tree over encrypted and normal data on Apache Spark (D)
16
12
12
20
8
8
16
4
4
20
Number of mappers
Partitions
Table 12 Quality parameters of the constructed R tree over encrypted and normal data on MapReduce
6.89E+16
6.107E+16
6.102E+16
6.99E+16
6.94E+16
6.89E+16
Normal
5.35E+16
5.58E+16
5.53E+16
5.48E+16
5.45E+16
5.40E+16
Encrypted
6.89E+16
6.107E+16
6.102E+16
6.99E+16
6.94E+16
6.89E+16
Encrypted
Root Node area
5.35E+16
5.58E+16
5.53E+16
5.48E+16
5.45E+16
5.40E+16
Normal
Root Node area
5.75E+21
5.98E+21
5.94E+21
5.88E+21
5.83E+21
5.77E+21
Encrypted
6.40E+21
6.62E+21
6.55E+21
6.49E+21
6.46E+21
6.41E+21
Normal
6.40E+21
6.62E+21
6.55E+21
6.49E+21
6.46E+21
6.41E+21
Encrypted
Leaf node area
5.75E+21
5.98E+21
5.94E+21
5.88E+21
5.83E+21
5.77E+21
Normal
Leaf node area
3.62E+18
3.79E+18
3.74E+18
3.69E+18
3.66E+18
3.62E+18
Normal
Overlap
8.01E+18
8.27E+18
8.17E+18
8.12E+18
8.08E+18
8.02E+18
Normal
Overlap
3.62E+18
3.79E+18
3.74E+18
3.69E+18
3.66E+18
3.62E+18
Encrypted
8.01E+18
8.27E+18
8.17E+18
8.12E+18
8.08E+18
8.02E+18
Encrypted
S. V. Limkar, R. K. Jha
3.39E+19 3.39E+19 1.09E+22 1.01E+17 1.01E+17
1.09E+22
3.68E+19
3.60E+19 3.60E+19
3.68E+19 1.33E+22
1.27E+22 1.27E+22
1.33E+22
1.21E+17
1.27E+17
1.21E+17
1.27E+17
3.44E+19
3.51E+19 3.51E+19 1.23E+22 1.13E+17 1.13E+17
1.23E+22
3.40E+19 3.40E+19
3.44E+19 1.16E+22 1.16E+22 1.08E+17 1.08E+17
1.09E+22 1.03E+17 1.03E+17
1.09E+22
The results obtained through the simulated environment shows that the proposed framework has achieved a security of the IoT generated spatial data while it is in transit through unsecured communication medium and when it is stored on public cloud. We performed various operation on encrypted data such as indexing it in to R tree and its variants, parallel merging of two encrypted trees on three different parallel architecture, i.e. MapReduce framework, Apache Spark on disk computation and Apache Spark in-memory computation, inserting new spatial objects and shooting range query on encrypted data etc. We compared operation performed on encrypted spatial data and non-encrypted spatial data. As we know each node of R tree and its variants contains at least m and at most M entries, where m ≤ M/2. In the implementation we considered m = 3 and M = 10. Dataset of 10,000 flights are used. Following parameters are used to evaluate the performance.
12,368 5 5
12,368
13,423
13,406 13,406
13,423
5
5
5
5
We considered geospatial data is of three dimensional, i.e. latitude, longitude and altitude, therefore the resulting leaf node MBRs are is of cuboid type and its volume is measured in cubic centimeters. This volume is same for all the trees i.e. tree constructed on sequential architecture and all three parallel architectures which is shown in Figs. 17, 18, 19, 20.
– – SP
15
19
16
Fig. 17 Leaf node MBR’s area of sequential R trees and its variants constructed on encrypted data and normal data
20
16
13,398 11 12
12
13,398 5 5
13,393
13,389
5 5 7
13,393
5 5 3
8 8
4
13,389
Normal Encrypted Normal
Number of reducers
4
20
Normal Encrypted Normal Encrypted Normal
Leaf nodes Height of the tree
Encrypted
Leaf node area Root Node area
9 Result
9.1 Volume
Number of mappers Partitions
Table 14 Quality parameters of the constructed R* tree over encrypted and normal data on Apache Spark (M)
Overlap
Encrypted
Computing over encrypted spatial data generated by IoT
Fig. 18 Leaf node MBR’s area in MapReduce framework, for R trees and its variants constructed on encrypted data and normal data
123
S. V. Limkar, R. K. Jha
Fig. 19 Leaf node MBR’s in Apache Spark (on-disk computation), for R trees and its variants constructed on encrypted data and normal data
Fig. 23 CPU Utilization for indexing encrypted data and normal data in R∗ tree on sequential and parallel frameworks
9.3 Execution time
Fig. 20 Leaf node MBR’s area in Apache Spark (in-memory computation), for R trees and its variants constructed on encrypted data and normal data
Execution time is the time required for creating an index of all the trees. It calculates as the difference between start and end time required for creating the index of all the trees i.e. trees constructed on traditional sequential architecture and all three parallel architectures is shown in Figs. 24, 25 and 26.
9.2 CPU utilization CPU utilization is one of the vital parameter. The results shown the maximum CPU utilization for generating the index in all the trees, i.e. trees constructed on traditional sequential architecture and all three parallel architectures is shown in Figs. 21, 22 and 23.
Fig. 24 Execution time for indexing encrypted data and normal data in R tree using sequential and parallel frameworks
Fig. 21 CPU Utilization for indexing encrypted data and normal data in R tree using sequential and parallel frameworks
Fig. 25 Execution time for indexing encrypted data and normal data in R+ tree using sequential and parallel frameworks
Fig. 22 CPU Utilization for indexing encrypted data and normal data in R+ tree using sequential and parallel frameworks
Fig. 26 Execution time for indexing encrypted data and normal data in R∗ tree using sequential and parallel frameworks
123
Computing over encrypted spatial data generated by IoT
Fig. 27 Execution time for inserting encrypted data and normal data and reconstructing R tree using sequential and parallel frameworks
Fig. 28 Execution time for inserting encrypted data and normal data and reconstructing R+ tree using sequential and parallel frameworks
Fig. 29 Execution time for inserting encrypted data and normal data and reconstructing R∗ tree using sequential and parallel frameworks
Fig. 30 Execution time for updating encrypted data and normal data and reconstructing R tree using sequential and parallel frameworks
Fig. 31 Execution time for updating encrypted data and normal data and reconstructing R+ tree using sequential and parallel frameworks
Fig. 32 Execution time for updating encrypted data and normal data and reconstructing R* tree using sequential and parallel frameworks
9.6 Memory utilization 9.4 Execution time for inserting data Execution time for inserting data is the average time required for inserting 1000 aircrafts data and reconstructing all the trees i.e. trees constructed on traditional sequential architecture and all three parallel architectures is shown in Figs. 27, 28 and 29.
Memory utilization is one of the important factor. The results shown the maximum memory utilization for generating the index of all the trees i.e. trees constructed on traditional sequential architecture and all three parallel architectures is shown in Figs. 33, 34 and 35.
9.5 Execution time for updating data In the proposed framework position of the aircraft is not fixed at all the time rather it is continuously changing. We randomly deleted 1000 aircrafts data and inserted 1000 aircrafts data and calculated the execution time required for the reconstruction of all the trees i.e. trees constructed on traditional sequential architecture and all three parallel architectures is shown in Figs. 30, 31 and 32.
Fig. 33 Memory utilization for indexing the encrypted data and normal data in R tree using sequential and parallel frameworks
123
S. V. Limkar, R. K. Jha
Fig. 34 Memory utilization for indexing the encrypted data and normal data in R+ tree using sequential and parallel frameworks
Fig. 35 Memory utilization for indexing the encrypted data and normal data in R∗ tree using sequential and parallel frameworks
Fig. 38 Overlap area in Apache Spark on-disk computation framework, for R trees and its variants on encrypted data and normal data
Fig. 39 Overlap area in Apache Spark in-memory computation framework, for construction of R trees and its variants on encrypted data and normal data
9.8 Search time 9.7 Overlap This is the addition of all overlap area in cubic centimeters. As in our case we have used three dimensional MBR (Latitude, Longitude, and Altitude) i.e. cuboid. Its unit is cubic centimeters. Overlap of all the trees shown in in Figs. 36, 37, 38 and 39.
This is the average time required to execute spatial range query i.e. find all objects A where DISTANCE (Query, A) ≤ Dist. Example, find all the aircrafts that are within the vicinity of 150 km from Pune International Airport location. We executed this spatial range query and evaluated its performance on various trees and is shown in Figs. 40, 41 and 42.
Fig. 36 Overlap area in sequential R trees and its variants on encrypted data and normal data
Fig. 40 Execution time for spatial range query on R tree using sequential and parallel architectures on encrypted data and normal data
Fig. 37 Overlap area in MapReduce framework, for R trees and its variants on encrypted data and normal data
Fig. 41 Execution time for spatial range query on R+ tree using sequential and parallel architectures on encrypted data and normal data
123
Computing over encrypted spatial data generated by IoT
Fig. 42 Execution time for spatial range query on R∗ tree using sequential and parallel architectures on encrypted data and normal data
Table 15 give the comparative analysis of various parameter on sequential and parallel approach.
10 Conclusion and future scope Storing data on the cloud leads to compromising the security and privacy because it is very easy for an inside attacker (e.g. cloud administrator) to compromise the security. Efficient index generation and improved quality of the generated index are the two major important factor that need to be considered before firing spatial query over encrypted spatial data. For indexing encrypted geospatial data we used R tree and its variants, because these data structure are emerged as practically most efficient indexing methods, widely recognized and adopted in geospatial data management. In our proposed algorithm Mappers of both the parallel
framework i.e. MapReduce and Apache Spark uses the existing algorithm of R tree and its variants for generating index on received partitioned data in parallel, whereas reducer of both the parallel framework uses our proposed tree merge algorithm for merging the input received from exactly two mappers. We generated index on three different parallel framework i.e. MapReduce, Apache Spark on-disk computation and Apache Spark in-memory computation along with sequential processing. The result guarantees two things, first one is that, the generated R tree and its variants build by using our methods underneath parallel architectures exhibits the same features as built on sequential processing and second one is that, the proposed framework has maintained perfect secrecy while IoT generated spatial data is in transit from edge devices to cloud through unsecured communication medium and when it is stored on the cloud. From the result it’s clear that Apache Spark in-memory computation build index much faster than all the existing methods also Apache Spark in-memory computation execute spatial query over encrypted data much faster than all the existing methods. Specifically result of the spatial range query execution time over encrypted spatial data of our proposed scheme is extremely efficient which takes slightly more time as taken by normal spatial range query executed over non-encrypted spatial data in sequential processor. Our scheme is not only efficient, but also highly compatible and scalable with IoT generated spatial data.
123
123
Search time in seconds
Overlap area in cubic centimeter
389746 170547
11 0.049 0.041
R* tree
1100
R* tree
3.E+18
R* tree
0.05 0.059
R + tree 0.035 R* tree
0.031
0.62
0.38
3.E+18
4.E+18
R tree
3.E+19
3.E+19
R + tree 4.E+18
R tree
1900
200
1800
200
R + tree 200
0.031
R + tree 0.037
0.37
0.75
R* tree
10
25
R tree
49
36
153264
R + tree 13
R tree
65766
tree 89541
R* tree
R+
789905
52.16
R* tree R tree
48.25
48.22
55.30
45.85
R + tree 44.16
R tree
5.77E+21
R* tree
Memory utilization for indexing the data in kilobytes R tree
Execution time for updating data in seconds
Execution time for inserting data in seconds
Execution time required for indexing in seconds
% CPU utilization for indexing
6.41E+21
5.77E+21
1.09E+22
1.09E+22
R tree
Leaf node MBR’s area in cubic centimeter
Normal data Encrypted data
Existing sequential approach
R + tree 6.41E+21
Tree
Parameters
Table 15 Comparative result analysis of various parameter on sequential and our parallel approach
Spark on-disk
Spark in-memory
0.019
0.02
0.21
3.E+18
4.E+18
3.E+19
1200
1200
1200
0.08
0.021
0.21
7
7
26
37648
46611
220177
26.42
25.89
20.11
5.77E+21
6.41E+21
1.09E+22
0.03
0.035
0.39
3.E+18
4.E+18
3.E+19
4500
3200
2100
0.021
0.038
0.39
10
12
35
92062
97362
426831
31.25
32.79
24.41
5.77E+21
6.41E+21
1.09E+22
0.01
0.015
0.15
3.E+18
4.E+18
3.E+19
1800
1600
1300
0.01
0.012
0.18
3
4
18
25170
35467
165081
17.67
16.93
10.08
5.77E+21
6.41E+21
1.09E+22
0.018
0.025
0.30
3.E+18
4.E+18
3.E+19
4800
4400
2800
0.012
0.025
0.25
5
10
25
71368
76532
377199
21.75
24.34
15.90
5.77E+21
6.41E+21
1.09E+22
0.001
0.001
0.08
3.E+18
4.E+18
3.E+19
2000
1900
1800
0.001
0.001
0.6
1
2
3
11793
22734
82541
4.42
4.23
2.52
5.77E+21
6.41E+21
1.09E+22
0.002
0.002
0.10
3.E+18
4.E+18
3.E+19
7200
4800
4000
0.002
0.002
0.16
3
5
5
34435
41094
153364
7.91
5.09
7.33
5.77E+21
6.41E+21
1.09E+22
Normal data Encrypted data Normal data Encrypted data Normal data Encrypted data
MapReduce
Our parallel approach
S. V. Limkar, R. K. Jha
Computing over encrypted spatial data generated by IoT Acknowledgements The authors gratefully acknowledge the support provided by 5G and IoT Lab, DoECE, and TBIC-Shri Mata Vaishno Devi University, Katra, Jammu. The authors would also like to thank the anonymous reviewers for their constructive comments and suggestions to improve the quality of the paper.
Appendix After applying homomorphic encryption to the Table 8, resulting encrypted values are shown in below Table 16.
Compliance with ethical standards Conflict of interest The authors declare that there is no conflict of interests regarding the publication of this paper.
Table 16 Applying encryption to Table 8 Aircraft ID
Aircraft source airport
Aircraft destination airport
Aircraft current position
Latitude
Longitude
Latitude
Longitude
Latitude
Longitude
Altitude
1220886865
4251025105
1567333394
2129486626
2519921547
2987917025
1163618781
1910389146
2868937858
9682651704
1435182114
1041861778
7184290325
9516938177
7773015603
4495732329
1316966928
4892636756
2092556997
4104748832
6116938822
8445362362
4826278130
7194732874
0745662142
8289524879
0150244004
3090843299
8460374580
2172920444
8393801741
9046828839
3739428125
5014746323
4788821720
6757045648
9925318482
7015772396
3815702029
0625050836
8760148509
5902076007
4553443152
0066838485
2250033541
6851707862
3672438568
2392748049
6276284981
1797748520
5709309060
3552928135
3582500799
7089754083
7251083377
6696958841
7176735752
2393455953
3355780280
8743382219
3397588776
2293940618
2626369142
8893782374
6216562191
7120736942
0655148063
6549954115
7668974071
6098661878
1908589847
6982760753
8012242975
2874453564
0995787089
5705353719
7666862063
3062613633
5360548230
2263884708
6557340787
1660193403
2389939376
4393714095
3715314246
3158438671
5429767602
7377567651
3781767318
1920275304
0474253812
8676368525
7359944333
8311752195
4692561188
2438838894
3722838783
0692031506
9479246171
8352445010
5619547398
6874868068
7960604525
7466184513
2332524546
8025217349
7102909238
0175455617
1554701929
0950650120
2385104514
4311011320
1505755695
7422469201
4716302597
8015045562
9843494984
9502154994
6752927699
6735645777
4836588405
5006399032
9668271166
4873211800
7399386716
0354752138
2334892151
4381468911
6787915198
4185976485
7379972070
5429500738
2990970242
7213771946
9720543447
8849436368
2013750628
9086019661
7868009556
5623416714
5212781483
1603179167
1788587037
0132464519
4722562281
5878402030
7808202115
4732594586
2895130929
3287217424
0937887446
7323014318
4936027517
8949066326
0146221435
2651549160
2911685771
3250025784
5963467025
6351552813
8789164587
1887331521
5221063394
5688253281
4388879369
0835890586
3972060319
4175711311
3644282213
3199408974
2410548360
3480753196
6380374205
0175711576
2614797438
0229451340
6642795948
6337060215
8891221478
9294781489
1160304531
7982748514
7172717348
6220765000
2078068566
6113362098
0266570205
2681045740
7722937774
9539284026
8303818559
2285466404
6572346853
8681697903
9075978213
1590292486
4568818784
7253372657
3404559356
1430218770
6725634065
6818866082
0502102057
8550190088
2912460363
9924457973
9045544448
1241114893
3355163443
4891290975
9456860694
0858142940
6761441163
9169398002
3047709930
7890760506
9823349064
4317514898
0457844345
1536987469
9919098195
6570932877
4935305821
0732566815
6219436204
2672066993
7061225951
8142814338
9125910730
9324367170
2548615780
0063245885
0966984456
6391451526
5176575156
5939127212
5903856806
5258878299
1557046839
6677247197
85935527
5530031
34006999
2999932
44022247
51350812
8795601
22134506
123
S. V. Limkar, R. K. Jha
References 1. Atzori, L., Iera, A., & Morabito, G. (2010). The internet of things: A survey. Computer Networks, 54(15), 2787–2805. 2. Gubbi, J., Buyya, R., Marusic, S., & Palaniswami, M. (2013). Internet of things (IoT): A vision, architectural elements, and future directions. Future Generation Computer Systems, 29(7), 1645–1660. 3. http://www.gartner.com/newsroom/id/3598917. 4. https://www.idc.com/. 5. Eagle, N., & Greene, K. (2014). Reality mining: Using big data to engineer a better world (1st ed.). Cambridge: The MIT Press. 6. Madden, S. (2012). Going big on spatial data: A mobile systems perspective. In Keynote at 20th ACM SIGSPATIAL international conference on advances in geographic information systems, Redondo Beach, CA. 7. Lee, J. G., & Kang, M. (2015). Geospatial big data: Challenges and opportunities. Journal of Big Data Research, 2(2), 74–81. 8. Dasgupta, A. (2013). Big data: The future is in analytics. http://www.geospatialworld.net/Magazine/MArticleView. aspx?aid=30512, April 2013, Geospatial World. 9. Lapkin, A. (2012). Hype cycle for big data, 2012. http://www. gartner.com/document/2100215, July 2012. 10. http://aws.amazon.com/solutions/case-studies/. 11. Song, D. X., Wagner, D., & Perrig, A. (2000). Practical techniques for searches on encrypted data. In Proceedings of IEEE SP (pp. 44–55). 12. Shahabi, C., Fan, L., Nocera, L., Xiong, L., & Li, M. (2015). Privacy-preserving inference of social relationships from location data: A vision paper. In Proceedings of ACM SIGSPATIAL GIS (pp. 1–4). 13. http://aws.amazon.com/. 14. https://cloud.google.com/. 15. Katz, J., & Lindell, Y. (2007). Introduction to modern cryptography: Principles and protocols. Boca Raton, FL: CRC Press. 16. TechRadarTM : Internet of things security, Q1 2017, A mix of new and existing technologies help secure IoT deployments, January 19, 2017. 17. Song, D., Wagner, D., & Perrig, A. (2000). Practical techniques for searches on encrypted data. In Proceedings of IEEE S&P’00. 18. Curtmola, R., Garay, J., Kamara, S., & Ostrovsky, R. (2006). Searchable symmetric encryption: Improved definitions and efficient constructions. In Proceedings of ACM CCS (pp. 79–88). 19. Kamara, S., Papamanthou, C., & Roeder, T. (2012). Dynamic searchable symmetric encryption. In Proceedings of ACM CCS (pp. 965–976). 20. Cash, D., Jarecki, S., Jutla, C., Krawczyk, H., Ro¸su, M.-C., & Steiner, M. (2013). Highly-scalable searchable symmetric encryption with support for Boolean queries. In Proceedings of CRYPTO (pp. 353–373). 21. Pappas, V., et al. (2014). Blind seer: A scalable private DBMS. In Proceedings of IEEE SP (pp. 359–374). 22. Cash, D., et al. (2014). Dynamic searchable encryption in verylarge databases: Data structures and implementation. In Proceedings of NDSS (pp. 1–16). 23. Stefanov, E., Papamanthou, C., & Shi, E. (2014). Practical dynamic searchable encryption with small leakage. In Proceedings of NDSS. 24. Wong, W. K., Kao, B., Cheung, D. W. L., Li, R., & Yiu, S. M. (2014). Secure query processing with data interoperability in a cloud database environment. In Proceedings of ACM SIGMOD (pp. 1395–1406). 25. Lai, J., Zhou, X., Deng, R. H., Li, Y., & Chen, K. (2013). Expressive search on encrypted data. In Proceedings of ACM ASIA CCS (pp. 243–251).
123
26. Sun, W., et al. (2013). Privacy-preserving multi-keyword text search in the cloud supporting similarity-based ranking. In Proceedings of ACM AISA CCS (pp. 71–82). 27. Ghinita, G., & Rughinis, R. (2014). An efficient privacypreserving system for monitoring mobile users: Making searchable encryption practical. In Proceedings of ACM CODASPY’14. 28. Wang, B., Li, M., Wang, H., & Li, H. (2015). Circular range search on encrypted spatial data. In Proceedings of IEEE CNS’15. 29. Zhu, H., Lu, R., Huang, C., Chen, L., & Li, H. (2015). An efficient privacy-preserving location based services query scheme in outsourced cloud. IEEE Transactions on Vehicular Technology, 65(9), 7729–7739. 30. Wang, B., Li, M., & Wang, H. (2016). Geometric range search on encrypted spatial data. IEEE Transactions on Information Forensics and Security, 11(4), 704–719. 31. Wang, B., Li, M., & Xiong, L. (2018). FastGeo: Efficient geometric range queries on encrypted spatial data. In IEEE transactions on dependable and secure computing (Vol. PP, No. 99, p. 1). 32. de Berg, M., Cheong, O., van Kreveld, M., & Overmars, M. (2008). Computational geometry: Algorithms and applications. New York: Springer. 33. Atallah, M. J., & Du, W. (2001). Secure multi-party computational geometry. In Proceedings of the international workshop on algorithms and data structures (pp. 165–179). 34. Du, W., & Atallah, M. J. (2001). Secure multi-party computation problems and their applications: A review and open problems. In Proceedings of new security paradigms workshop (pp. 13–22). 35. Narayanan, A., Thiagarajan, N., Lakhani, M., Hamburg, M., & Boneh, D. (2011). Location privacy via private proximity testing. In Proceedings of NDSS. 36. Šed-enka, J., & Gasti, P. (2014). Privacy-preserving distance computation and proximity testing on earth, done right. In Proceedings of ACM ASIA CCS (pp. 99–110). 37. Goldreich, O. (2004). Foundations of cryptography: Basic applications (Vol. 2). Cambridge: Cambridge University Press. 38. Azad, M. A., Bag, S., & Hao, F. (2017). M2M-REP: Reputation of machines in the internet of things. In Proceedings of the 12th international conference on availability, reliability and security (ARES ’17). ACM, New York, NY, USA, Article 28, 7. 39. Ajmal, M., Bag, S., Tabassum, S., & Hao, F. (2017). privy: Privacy preserving collaboration across multiple service providers to combat telecoms spam. In IEEE transactions on emerging topics in computing. 40. Azad, M. A., & Bag, S. (2017). Decentralized privacy-aware collaborative filtering of smart spammers in a telecommunication network. In Proceedings of the symposium on applied computing (SAC ’17) (pp. 1711–1717). New York, NY: ACM. 41. Khan, M. A., & Salah, K. (2018). IoT security: Review, blockchain solutions, and open challenges. Future Generation Computer Systems, 82, 395–411. 42. Al-Fuqaha, A., Guizani, M., Mohammadi, M., Aledhari, M., & Ayyash, M. (2015). Internet of things: A survey on enabling technologies, protocols, and applications. In IEEE communications surveys and tutorials (Vol. 17, no. 4, pp. 2347–2376). Fourthquarter. 43. Botta, A., de Donato, W., Persico, V., & Pescapé, A. (2014). On the integration of cloud computing and internet of things. In 2014 International conference on future internet of things and cloud, Barcelona (pp. 23–30). 44. https://xively.com/. 45. https://thingspeak.com/. 46. http://iot-toolkit.com/. 47. http://open.sen.se/. 48. http://www.iot-a.eu/. 49. http://www.nimbits.com/.
Computing over encrypted spatial data generated by IoT 50. http://www.netlabtoolkit.org/learning/tutorials/iot-cloudservices/. 51. http://www.openiot.eu/. 52. http://www.zettajs.org/. 53. https://sites.google.com/site/opensourceiotcloud/. 54. Jiang, L., Da Li, X., Cai, H., Jiang, Z., Fenglin, B., & Boyi, X. (2014). An IoT-oriented data storage framework in cloud computing platform. IEEE Transactions on Industrial Informatics, 10(2), 1443–1451. 55. https://nodejs.org/. 56. http://www.flightradar24.com/. 57. Guttman, A. (1984). R-trees: A dynamic index structure for spatial searching. In Proceedings of ACM SIGMOD international conference on management of data. 58. Sellis, T., Roussopoulos, N., & Faloutsos, C. (1987). The R+-tree: A dynamic index for multidimensional objects. In Proceedings of the 13th international conference on very large databases (VLDB). 59. Beckmann, N., Kriegel, H.-P., Schneider, R., & Seeger, B. (1990). The R*-tree: An efficient and robust access method for points and rectangles. In Proceedings of ACM SIGMOD international conference on management of data. 60. Nievergelt, J., et al. (1984). The grid file: An adaptable, symmetric multikey file structure. ACM Transactions on Database Systems, 9(1), 38–71. 61. Kalojanov, J., & Slusallek, P. (2009). A parallel algorithm for construction of uniform grids. In Proceedings of the 1st ACM conference on high performance graphics—HPG ’09 (New York, New York, USA, 2009) (p. 23). 62. Yang, K., et al. (2007). In-memory grid files on graphics processors. In Proceedings of the 3rd international workshop on data management on new hardware—DaMoN ’07 (p. 1). 63. Finkel, R. A., & Bentley, J. L. (1974). Quad trees a data structure for retrieval on composite keys. Acta Informatica, 4(1), 1–9. 64. Samet, H. (2006). Foundations of multidimensional and metric data structures. Burlington: Morgan Kaufmann Publishers Inc. 65. Yi, K. (2008). Encyclopedia of algorithms. New York: Springer. 66. Aljafer, H., Malik, Z., Alodib, M., & Rezgui, A. (2014). A brief overview and an experimental evaluation of data confidentiality measures on the cloud. Journal of Innovation in Digital Ecosystems, 1(1–2), 1–11. 67. Acar, A., Aksu, H., Selcuk Uluagac, A., & Conti, M. (2017). A survey on homomorphic encryption schemes: theory and implementation. CoRR arXiv:1704.03578 68. Liu, J., Mesnager, S., & Chen, L. (2016). Partially homomorphic encryption schemes over finite fields. In C. Carlet, M. Hasan, & V. Saraswat (Eds.), Security, privacy, and applied cryptography engineering. SPACE 2016. Lecture notes in computer science (Vol. 10076). Cham: Springer. 69. Kaosar, M. G., Paulet, R., & Yi, X. (2012). Fully homomorphic encryption based two-party association rule mining. Data and Knowledge Engineering, 76, 1–15. 70. Rivest, R. L., Shamir, A., & Adleman, L. (1978). A method for obtaining digital signatures and public-key cryptosystems. Communications of the ACM, 21(2), 120–126. 71. Paillier, P. (1999). Public-key cryptosystems based on composite degree residuosity classes. In J. Stern (Ed.), Advances in cryptology (EUROCRYPT 1999). Lecture notes in computer science (Vol. 1592, pp. 223–238). Berlin: Springer. 72. Mani, M., Shah, K., & Gunda, M. (2013). Enabling secure database as a service using fully homomorphic encryption: Challenges and opportunities. arXiv preprint arXiv:1302.2654. 73. Lee, C.-C., Chung, P.-S., & Hwang, M.-S. (2013). A survey on attribute-based encryption schemes of access control in cloud environments. International Journal of Network Security, 15(4), 231–240.
74. Liu, X., Zhang, Y., Wang, B., & Yan, J. (2013). Mona: Secure multi-owner data sharing for dynamic groups in the cloud. IEEE Transactions on Parallel and Distributed Systems, 24(6), 1182– 1191. 75. Liu, Q., Wang, G., & Wu, J. (2010). Efficient sharing of secure cloud storage services. In 2010 IEEE 10th international conference on computer and information technology (CIT) (pp. 922–929). IEEE. 76. Yu, S., Wang, C., Ren, K., & Lou, W. (2010). Attribute based data sharing with attribute revocation. In Proceedings of the 5th ACM symposium on information, computer and communications security (pp. 261–270). ACM. 77. Yang, X., Du, W., Wang, X., & Wang, L. (2013). Fully secure attribute-based encryption with non-monotonic access structures. In 2013 5th international conference on intelligent networking and collaborative systems (pp. 521–527). Xi’an. 78. Shuci, J., Weibin, G., & Guisheng, F. (2017). Hierarchy attributebased encryption scheme to support direct revocation in cloud storage. In 2017 IEEE/ACIS 16th international conference on computer and information science (ICIS) (pp. 869–874). 79. Prabhakar, D. M., & Joseph, K. S. (2013). A new approach for providing data security and secure data transfer in cloud computing. IJCTT, 4(5), 1202–1207. 80. Seo, J. H., & Cheon, J. H. (2011). Fully secure anonymous hierarchical ldentity based encryption with constant size cipher texts. IACR Cryptology ePrint Archive, 2011, 21. 81. Baek, J., Newmarch, J., Safavi-Naini, R., & Susilo, W (2004). A survey of identity-based cryptography. In Proceedings of the 10th Annual Conference for Australian Unix User’s Group (AUUG 2004) (pp. 95–102). 82. Parsha, S. K., & Pasha, M. K. (2012). Enhancing data access security in cloud computing using hierarchical identity based encryption (hibe). International Journal of Scientific and Engineering Research, 3(5), 5. 83. Wang, G., Liu, Q., & Wu, J. (2011). Achieving fine-grained access control for secure data sharing on cloud servers. Concurrency and Computation Practice and Experience, 23(12), 1443–1464. 84. Dong, X., Yu, J., Luo, Y., Chen, Y., Xue, G., & Li, M. (2013). Achieving secure and efficient data collaboration in cloud computing. In 2013 IEEE/ACM 21st international symposium on quality of service (IWQoS) (pp. 1–6). IEEE. 85. Fontaine, C., & Galand, F. (2007). A survey of homomorphic encryption for nonspecialists. EURASIP J. Inf. Secur. Article 15(January 2007). 86. Rivest, R. L., Shamir, A., & Adleman, L. (1978b). A method for obtaining digital signatures and public-key cryptosystems. Communications of the ACM, 21(2), 120–126. 87. Goldwasser, S., & Micali, S. (1982). Probabilistic encryption & how to play mental poker keeping secret all partial information. In Proceedings of the fourteenth annual ACM symposium on Theory of computing (pp. 365–377). ACM. 88. ElGamal, T. (1985). A public key cryptosystem and a signature scheme based on discrete logarithms. In G. R. Blakley, & D. Chaum (Eds.), Advances in cryptology (CRYPTO 1984). Lecture notes in computer science (Vol. 196, pp. 10–18). Berlin: Springer. 89. Benaloh, J. (1994). Dense probabilistic encryption. In Proceedings of the workshop on selected areas of cryptography (pp. 120–128). 90. Naccache, D., & Stern, J. (1998). A new public key cryptosystem based on higher residues. In Proceedings of the 5th ACM conference on computer and communications security (pp. 59–66). ACM. 91. Okamoto, T., & Uchiyama, S. (1998). A new public-key cryptosystem as secure as factoring. In Advances in cryptology— EUROCRYPT’98 (pp. 308–318). Springer.
123
S. V. Limkar, R. K. Jha 92. Paillier, P. (1999). Public-key cryptosystems based on composite degree residuosity classes. In Advances in cryptology— EUROCRYPT’99 (pp. 223–238). Springer. 93. Yao, A. C. (1982). Protocols for secure computations. In Proceedings of the 23rd Annual Symposium on Foundations of Computer Science (SFCS ’82) (pp. 160–164). Washington, DC: IEEE Computer Society. 94. Sander, T., Young, A., & Yung. M. (1999). Non-interactive crypto computing for NC1. In 40th annual symposium on foundations of computer science (pp. 554–566). 95. Boneh, D., Goh, E. J., & Nissim, K. (2005). Evaluating 2-DNF formulas on ciphertexts. In J. Kilian (Ed.), Theory of cryptography (TCC 2005). Lecture notes in computer science (Vol. 3378, pp. 325–341). Berlin: Springer. 96. Ishai, Y., & Paskin, A. (2007). Evaluating branching programs on encrypted data. In S. P. Vadhan (Ed.), Theory of cryptography (TCC 2007). Lecture notes in computer science (Vol. 4392, pp. 575–594). Berlin: Springer. 97. Gentry, C. (2009). A fully homomorphic encryption scheme. Ph.D. dissertation. Stanford University. 98. Van Dijk, M., Gentry, C., Halevi, S., & Vaikuntanathan, V. (2010). Fully homomorphic encryption over the integers. In Advances in cryptology—EUROCRYPT 2010 (pp. 24–43). Springer. 99. Brakerski, Z., & Vaikuntanathan, V. (2011). Fully homomorphic encryption from ring-LWE and security for key dependent messages. In Advances in cryptology—CRYPTO 2011 (pp. 505–524). Springer. 100. López-Alt, A., Tromer, E., & Vaikuntanathan, V. (2012). Onthe-fly multiparty computation on the cloud via multikey fully homomorphic encryption. In Proceedings of the forty-fourth annual ACM symposium on theory of computing (pp. 1219–1234). ACM. 101. Gjøsteen, K. (2004). Subgroup membership problems and public key cryptosystems. Ph.D. Dissertation, Norwegian University of Science and Technology. https://brage.bibsys.no/xmlui/ bitstream/handle/11250/249681/121977_FULLTEXT01.pdf? sequence=1. 102. Galbraith, S., Gebregiyorgis, S., & Murphy, S. (2016). Algorithms for the approximate common divisor problem. LMS Journal of Computation and Mathematics, 19(A), 58–72. 103. Atmospheric and space flight vehicle coordinate systems, ANSI/AIAA R-004-1992. 104. Stevens, B. L., & Lewis, F. L. (1992). Aircraft control and simulation. New York: Wiley. 105. Zipfel, P. H. (2000). Modeling and simulation of aerospace vehicle dynamics. AIAA education series. Reston, Virginia. 106. Moon, B., Jagadish, H. V., Faloutsos, C., & Saltz, J. H. (2001). Analysis of the clustering properties of the hilbert space-filling curve. IEEE Transactions on Knowledge and Data Engineering, 13(1), 124–141. 107. Zhou, X., Abel, D. J., & Truffet, D. (1998). Data partitioning for parallel spatial join processing. GeoInformatica, 2(2), 175–204. 108. Muthukrishnan, S., Poosala, V., & Suel, T. (1999). On rectangular partitionings in two dimensions: Algorithms, complexity, and applications. In ICDT (pp. 236–256). 109. Scheuermann, P., Weikum, G., & Zabback, P. (1998). Data partitioning and load balancing in parallel disk systems. The VLDB Journal, 7(1), 48–66. 110. Nguyen Thai, B., & Olasz, A. (2015). Raster data partitioning for supporting distributed GIS processing. In The international archives of the photogrammetry, remote sensing and spatial information sciences, volume XL-3/W3, 2015 ISPRS geospatial week 2015, 28 Sep–03 Oct 2015, La Grande Motte, France. 111. Aji, A., Vo, H., & Wang, F. (2015). Effective spatial data partitioning for scalable query processing. CoRR arXiv:1509.00910
123
112. Vo, H., Aji, A., & Wang, F. (2014). SATO: A spatial data partitioning framework for scalable query processing. In Proceedings of the 22nd ACM SIGSPATIAL international conference on advances in geographic information systems (SIGSPATIAL ’14) (pp. 545– 548). New York, NY: ACM. 113. Hershberger, J., Shrivastava, N., Suri, S., & Toth, C. D. (2006). Adaptive spatial partitioning for multidimensional data streams. Algorithmica, 46(1), 97–117. 114. Akdogan, A., Indrakanti, S., Demiryurek, U., & Shahabi, C. (2015). Cost-efficient partitioning of spatial data on cloud. In 2015 IEEE international conference on big data (big data), Santa Clara, CA (pp. 501–506). 115. Ferhatosmanoglu, H., Agrawal, D., Egecioglu, Ö., & El Abbadi, A. (2005). Optimal data-space partitioning of spatial data for parallel I/O. Distributed and Parallel Databases, 17(1), 75–101. 116. Dean, J., & Ghemawat, S. (2008). MapReduce: Simplified data processing on large clusters. Communications of the ACM, 51, 107–113. 117. Apache Hadoop Project. Open source software for reliable, scalable, distributed computing [EB/OL] [2010-09-18]. http:// hadoop.apache.org. 118. http://spark.apache.org/. 119. http://storm.apache.org/. 120. https://impala.incubator.apache.org/. 121. http://giraph.apache.org/. 122. Lee, D. T., & Preparata, F. P. (1984). Computational geometry: A survey. IEEE Transactions on Computers, C–33(12), 1072–1101. 123. http://www.gns-gmbh.com/index.php?id=238&L=1. 124. https://my.vmware.com/en/web/vmware/free#desktop_end_ user_computing/vmware_workstation_player/12_0. 125. https://www.cloudera.com. 126. Hill, M. D., & Marty, M. R. (2008). Amdahl’s law in the multicore era. Computer, 41(7), 33–38.
Suresh V. Limkar is currently an assistant professor in Computer Engineering Department, AISSMS Institute of Information Technology, Pune, Maharashtra, India. He received the B.E. degree in Computer Science Engineering from Dr. Babasaheb Ambedkar Marathwada University, Aurangabad, India, M.E. degree in Computer Engineering from Pune University, Pune, India. He is currently pursuing Ph.D. degree in Electronics and Communication Engineering at Shri Mata Vaishno Devi University, Katra, Jammu and Kashmir, India. He had published his research work in more than 25 International Conferences and Journals. His research interest includes the Internet of Things, Network Security and Location Based Services, Currently he is doing his research work on efficiently processing IoT generated data and security issues IoT data transmission. He is a member of IEEE, life member of CSI and ISTE.
Computing over encrypted spatial data generated by IoT Rakesh Kumar Jha is currently an assistant professor in school of electronics and communication department, SMVD University Katra (J&K). He is carrying out his research in WiMAX and Security issues in the laboratory ECED Lab, SMVDU. Involved research topics include WiMAX performance analysis, LBRRA, power optimization and security analysis. He has done B.Tech. in Electronics and Communication from Bhopal and M.Tech. from NIT Jalandhar, India. Received his Ph.D. degree from NIT Surat in 2013. Published more than 50 International Conference and Journal papers. His area of interest is Wireless communication, Communication System and computer network, and Security issues (Opti System). Dr. Jha’s one concept related to router of Wireless Communication has been accepted by ITU (International Telecommunication Union) in 2010. He has received young scientist author award by ITU in Dec 2010, APAN fellowship in 2011 and student travel grant from COMSNET 2012. He is a member of IEEE, GISFI and SIAM, International Association of Engineers (IAENG) and ACCS (Advance Computing and Communication Society).
123