Steganalysis of Network Packet Length Based Data Hiding

Recently, data hiding by modifying network parameters like packet header, payload, and packet length has become popular among researchers. Different a...

2 downloads 25 Views 1MB Size

Download PDF

Circuits Syst Signal Process (2013) 32:1239–1256 DOI 10.1007/s00034-012-9497-8

Steganalysis of Network Packet Length Based Data Hiding Arijit Sur · Anand S. Nair · Abhishek Kumar · Apul Jain · Sukumar Nandi

Received: 19 May 2012 / Revised: 10 September 2012 / Published online: 29 September 2012 © Springer Science+Business Media New York 2012

Abstract Recently, data hiding by modifying network parameters like packet header, payload, and packet length has become popular among researchers. Different algorithms have been proposed during the last few years which have altered the network packets in different ways to embed the data bits. Some of these algorithms modify the network packet length for embedding. Although most of the packet length based embedding schemes try to imitate the normal network traffic distribution, they have altered the statistical distribution of network packet lengths during embedding. These statistical anomalies can be exploited to detect such schemes. In this paper, a second order detection scheme for packet length based steganography has been proposed. A comprehensive set of experiments have been carried out to show that the proposed detection scheme can detect network packet length based steganography with a considerably high accuracy. Keywords Steganography · Steganalysis · Network packet length · Covert channel

A. Sur () · A.S. Nair · A. Jain · S. Nandi Department of Computer Science and Engineering, IIT Guwahati, Guwahati 781039, India e-mail: [email protected] A.S. Nair e-mail: [email protected] A. Jain e-mail: [email protected] S. Nandi e-mail: [email protected] A. Kumar Department of Mathematics and Computing, IIT Guwahati, Guwahati 781039, India e-mail: [email protected]

1240

Circuits Syst Signal Process (2013) 32:1239–1256

1 Introduction Steganography, a word which originates from ancient Greece, literally means covert writing. Basically, steganography is an art of secret communication which includes a vast array of methods of secret communication that conceal the very existence of hidden information. Traditional methods include the use of invisible inks, microdots, etc. Modern day steganographic techniques try to exploit digital media images, audio files, video files [17–19], or even network packets [2, 13]. In the recent past, network steganography has become an interesting research domain where secret data are embedded not only into a network packet’s header or its payload, but into the structure of the packet streams. Some algorithms have also used a hybrid of both techniques. Network packet header based schemes [1, 2] generally have high steganographic capacity, but they can be easily detectable [13]. The embedding of secret data in the TCP header is described in [13]. On the other hand, network packet payload based embedding schemes are less vulnerable but are of limited capacity. Another method involves hiding the data in both the header and payload of the network packet [23]. Packet stream based schemes have been proposed in [21], and a hybrid scheme is proposed in [12]. The concept of covert channels was first introduced by Lampson [11] for the transfer of secret information across the network. Internet Protocol (IP) Time to Live (TTL) fields and the record route option in the IP fields are used in [25] and [24], respectively. Another important way of hiding secret bits is to alter the length of network packets. A detailed discussion of network packet length based steganography is given in Sect. 2. A detailed survey of the network steganographic algorithms is described in [15]. Steganalysis, on the other hand, is a science to detect the presence of covert data in an innocent-looking cover object. In general, steganalysis can be categorized into two types: blind steganalysis and targeted steganalysis. Blind steganalysis algorithms are meant to detect a wide range of steganographic algorithms, and targeted steganalysis algorithms are usually intended for a specific steganographic algorithm. One of the major trends in steganalysis is to identify the embedding-sensitive features and to train a supervised learning classifier with those features to detect the statistical anomalies caused by the embedding. Steganalysis of network data hiding is an emerging research trend. Few papers have been proposed to detect different network based hiding schemes. For example, the detection scheme for IP covert is proposed by Cabuk et al. in [3]. In his earlier paper [4], Cabuk et al. proposed a detection scheme for a robust covert channel design. In [13], the authors have addressed several ways to detect network steganography including Transmission Control Protocol (TCP) initial sequence number (ISN) based hiding or IP header based hiding. In [14], we have proposed a detection scheme which can detect a prototype of length based steganographic scheme which alters User Datagram Protocol (UDP) packet lengths to embed secret bits in a chat application. In this paper, existing steganographic schemes [9, 10] are analyzed to find the distinguishing statistics which are sensitive to the steganographic embedding. It is observed that first order statistical features are adequate for detection in [9] but higher (second) order features are required for detection in [10]. A supervised learning classifier is being trained using these statistical features for the desired classification.

Circuits Syst Signal Process (2013) 32:1239–1256

1241

A comprehensive set of experiments have been carried out to validate the performance of the proposed detection scheme. The rest of the paper is organized as follows. The concept of packet length based steganography has been introduced in Sect. 2. The proposed detection scheme is introduced in Sect. 3. In Sect. 4, the experimental results have been presented. The paper is concluded in Sect. 5.

2 Packet Length Based Network Steganography Data bits are communicated across a network in the form of packets of different lengths. The packet length distribution depends on the application as well as the protocol that sends the packets across the network. For example, packets for the File Transfer Protocol (FTP) for a typical file transfer application will use the maximum size of the packet to send the data bits. It is possible to change the length of the network packets to embed the secret data. This idea is exploited in packet length based steganography. The detection of this embedding becomes difficult if the packet trace imitates the normal network flow. Network packet length based hiding schemes have been introduced by Padlipsky [16] and Girling [7] in the data link layers. In [16], Padlipsky et al. have proposed an embedding scheme where secret data is transferred by modulating the length of link layer frames. The method employs 256 different frame lengths to describe the bytes. The frame length distribution in this scheme is random and it does not imitate the normal network traffic flow. It is found that the packet length distribution has been changed hugely by these schemes and any third party can detect the presence of embedding just by looking at the embedded packet length distribution. In another algorithm, Yao Quan-zhu [20] has proposed a model which uses the length of the packets to transfer the secret message. In this scheme, both the sender and the receiver share a secret matrix, and all the cells represent unique length. The sender selects a cell randomly and finds out the length of the message to be sent, and the receiver picks up the same algorithm to find the random number from the matrix to retrieve the secret message. Recently Liping Ji et al. [9] have proposed an algorithm in which they have tried to imitate the normal network traffic flow. In this scheme, the sender and the receiver create a reference which is formed by collecting certain lengths of the normal traffic, and these lengths are used to send the secret message. A detailed discussion on the scheme in [9] is given in the next subsection. 2.1 Liping Ji’s First Scheme In this scheme, Liping Ji et al. [9] proposed a protocol-independent length based steganographic scheme which modifies the length of the network packets to hide the secret data. Authors have simulated a covert channel on Hypertext Transfer Protocol (HTTP) and analyzed its security and bandwidth. Their model is described in Fig. 1. Their covert model consist of three main characters as shown in Fig. 1. 1. Alice is the sender of a secret message. 2. Bob is the receiver of the secret message.

1242

Circuits Syst Signal Process (2013) 32:1239–1256

Fig. 1 Ji’s model for covert channel

Fig. 2 A time series for normal packet stream using the scheme in [9]

3. Wendy is the covert channel attacker, who can only monitor any fields without affecting any communication. In this covert channel, Wendy is usually a network device which checks every packet sent between Alice and Bob and tries to determine if the packet contains any secret data. In Ji’s first scheme [9], the secret message is partitioned into several groups to send. Before embedding bits by modifying packet length, a few packets are communicated normally. This set of normally communicated packets is recorded as Reference to be used in both sender and receiver. For embedding a group of binary bits of the secret message, the decimal equivalent is calculated for the group. Then a packet length is randomly chosen from the Reference using a shared secret key. The packet length is altered to carry the decimal equivalent of the binary bits in the group. The resulting packet with modified packet length is communicated to the receiver and also added to the Reference list for future correspondence. The receiver can also choose the packet length from the Reference using the same shared secret key and calculate the decimal equivalent of the secret bits from the modified packet length. A time series of packets for both normal and embedded traffic is shown in Figs. 2 and 3 where the x axis denotes the packet numbers in the time series and the y axis denotes the length of the respective packets. In this paper, all experiments are done on the Clarknet dataset [5]. From Figs. 2 and 3, it can be concluded that it is tough to distinguish between cover and stego traffic by only observing their network packet distribution. Some

Circuits Syst Signal Process (2013) 32:1239–1256

1243

Fig. 3 A time series for stego packet stream using the scheme in [9]

Table 1 Comparison of statistical properties between normal and stego traffic flow

Statistics

Normal

Liping Ji I

Liping Ji II

Correlation

1

−0.0208

−0.0035

Mean

9938.9

9937.0

9903.3

Variance

2.3266 × 108

2.6148 × 108

2.2457 × 108

Skew

2.9541

4.6687

3.051

Kurtosis

13.699

45.970

14.433

MAD

9.687 × 103

9.757 × 103

9.424 × 103

statistical analysis is needed to differentiate stego traffic from normal traffic. Table 1 shows measured statistical parameters of stego and normal traffic flow. 2.2 Liping Ji’s Second Scheme In 2009, Liping Ji [10] proposed another novel protocol-independent network covert channel based on message length using the same covert channel model as described in Fig. 1. This method also takes normal network communicating messages as the Reference. In covert transmission, each message learns from the Reference to get a similar length distribution where the Reference (say R) is the normal packet (length) flow between sender and receiver. Reference(R) is sorted in increasing order and then divided into 2w buckets where w secret bits are to be embedded by altering the length of a single packet. A packet is randomly picked from the bucket whose number is the decimal equivalent of the (w bits) secret bits. Let S = s0 s1 . . . sk−1 be a k-bit binary string representing the secret message that Alice will send to Bob. The secret message is usually partitioned into several subgroups to send. Let Wi be the ith subgroup of S, and let Wi = Si∗w , Si∗w+1 , . . . , Si∗w+w−1 where (i = 0, 1, . . . , wk − 1). The step-by-step algorithm is as follows.

1244

Circuits Syst Signal Process (2013) 32:1239–1256

1. Reference(R) is recorded at both the sender (Alice) and the receiver (Bob) end which is the normal packet flow between Alice and Bob. 2. Reference(R) is arranged in increasing order and divided into 2w buckets represented by β0 , β1 , . . . , β2w −1 . 3. If there are some common entries in the two adjacent buckets (β’s), then all the common entries will be put in the previous bucket (β). 4. Now for each Wi , its decimal equivalent (say d) is calculated and since Wi consists of w bits, the range of decimal equivalents lies between 0 to 2w − 1. A random value from that βd (where d = decimal equivalent of Wi ) is chosen and regarded as the current packet length to be sent. 5. Bob can determine the corresponding bucket number βd and the binary equivalent (d) is regarded as the secret w bit data. Liping Ji [10] has done this experiment by keeping the values of w = 2; thus for each packet transmission 2 bits of secret data are sent. As the selection of packet length is from the Reference which is constructed from the normal flow of packet length, the stego (embedded) packet length distribution follows the normal length distribution. 2.3 Analysis of Liping Ji’s Second Scheme It can be shown mathematically that the probability of choosing a packet of given size (say s), having a given frequency (total count) of occurring (say x%) in normal flow is the same as the probability of choosing the same packet length from the proposed algorithm by Ji [10]. This probability is independent of the total number of secret bits (w) sent in one packet. Let the size of the normal packet samples be n, and a packet of size s, has the frequency of occurring x% in the total number of n packets. Let the size of the Reference(R) (say Rlen ) constructed be r% of n. This implies that r ×n . (1) Rlen = 100 Since the Reference is constructed from the normal flows only, the number of len packets of size s in the reference Reference(R) is x×R 100 . Now from Ji’s second algorithm [10], the Reference (of size Rlen ) is distributed in r×n 2w buckets and each bucket contains 100×2 w packets. Let the probability of choosing a packet of sizes from the normal packet sets be pnormal ; thus pnormal can be calculated as x . (2) pnormal = 100 Again, let the probability of choosing a packet of sizes from buckets β’s [refer to Sect. 2.2] using Ji’s second algorithm [10] be pstego ; thus pstego can be calculated as pstego = p1 × p2 ,

(3)

where p1 = probability of choosing the particular bucket in which s lies, so p1 =

1 , 2w

(4)

Circuits Syst Signal Process (2013) 32:1239–1256

1245

where a number w of secret bits is embedded by altering the length of a single packet [refer to Sect. 2.2] and p2 = probability of choosing a packet of size s from that bucket, i.e., p2 =

no. of packets having size s , no. of total packets in bucket

i.e., p2 =

x×Rlen 100 Rlen 2w

(5)

.

So pstego can be calculated as pstego =

1 × 2w

x×Rlen 100 Rlen 2w

=

x . 100

(6)

From Eqs. (2) and (6), it can be concluded that the probability of choosing a packet of a particular length is the same for both normal and stego traffic, so for any value of w Ji’s second algorithm [10] will always produce a stego packet length distribution similar to the normal packet length distribution. In Fig. 4, the packet length distributions for both normal and stego traffic have been shown. It can be observed from Fig. 4 that the probability distribution graphs for normal and stego packet flow are the same, which supports the analytical results. Some higher order statistical measurements like mean, standard deviation, kurtosis, skewness, mean absolute deviation (MAD), mode, interquartile range (IQR), etc., are tabulated in Table 1 for both normal and stego packet length distributions. It is observed from Table 1 that these statistical properties hardly distinguish between the normal and the stego packet length distribution.

3 Proposed Detection Scheme In packet length based steganography, normal packet length distribution is altered due to embedding. Different packet length statistics can be analyzed to find the anomalies that possibly occur due to embedding. 3.1 First Order Statistical Analysis of Packet Length Sequence 3.1.1 Packet Length Analysis A first order statistics is determined from the time series of the packet lengths (say, packet length sequence) and is called the Packet Length Vector (χ). This χ is defined as a vector containing the number of packets for each of the valid packet lengths. In other words, χ represents the relative frequency of a particular packet length in the series of packets. The χ of a packet length sequence with a range of packet lengths [0 . . . L − 1] is represented by the following discrete function: nk (7) χ(rk ) = , n

1246

Circuits Syst Signal Process (2013) 32:1239–1256

Fig. 4 A comparison between normal and stego flow using the scheme of [10]

where rk is the kth packet length in the packet length sequence, nk is the number of packets with packet length = rk and n is the total number of packets in the packet length sequence. The Packet Length Vector (χ ) is given as χ = χ(r0 ), χ(r1 ), χ(r2 ), . . . , χ(rL−1 ) (8) or simply

χ = χ(0), χ(1), χ(2), . . . , χ(L − 1) .

(9)

A typical Packet Length Vector (χ ) for a network packet length sequence is shown in Fig. 5. 3.1.2 Effect of Embedding on χ Since the steganographic embedding is done by modifying the length of the packets, the Packet Length Vector (χ ) is altered due to embedding. A steganographic communication is suspected if any distinguishing statistics can be figured out from this

Circuits Syst Signal Process (2013) 32:1239–1256

1247

Fig. 5 Packet Length Vector for normal packet stream

alteration of the Packet Length Vector (χ ) as a result of embedding. In Fig. 6, a Packet Length Vector for a normal packet length sequence and the corresponding embedded packet length sequence is shown. It is observed from Fig. 6, that the χ plot for packet length sequence is altered due to embedding. Harmsen et al. [8] first addressed the effect of embedding on the first order statistics of the cover in the case of image steganography. They claimed that steganographic embedding, and specifically least significant bit (LSB) embedding on an image, smoothen the image histogram and that this smoothness can be quantified. In our study, it is observed that packet length based steganography alters the Packet Length Vector (χ ). This embedding effect can be quantified in a way similar to that discussed in [8]. To quantify the effect of embedding on Packet Length Vector (χ ), an m-point discrete Fourier transform (DFT) is performed on χ . Let χdft be the DFT transformed version of the χ . The center of mass (COM) of χdft is then determined as in the method proposed by Harmsen et al. in [8] to detect the embedding impact. The COM of χdft can be calculated as follows: m iχdft COM = i=0 (10) m i=0 χdft where m is the length of the χdft vector. It is observed that, for packet length based network steganography, COM(Stego) is usually greater than COM(Cover), where Stego is an embedded packet stream with an embedded message and Cover is a normal packet stream. Furthermore, as the embedding process distorts Packet Length Vector (χ ), the numbers of local maxima and minima of Packet Length Vector (χ ) are likely to increase due to the embedding. A similar idea was proposed by Jun Zhang et al. [26] for detecting the LSB matching technique for never-compressed grayscale images. The corresponding metric [say distortion metric (δ)] can be defined as in [26]: 2χ(e) − χ(e − 1) − χ(e + 1) (11) δ= n

where n is the number of bins in Packet Length Vector (χ ) and e is the position of a maximum or minimum. For packet length based network steganography, δ(Stego) is

1248

Circuits Syst Signal Process (2013) 32:1239–1256

Fig. 6 Comparison between normal χ and stego Packet Length Vector

usually greater than δ(Cover), where Stego is the embedded packet stream and Cover is the normal packet stream. 3.2 Second Order Statistical Analysis of Packet Length Sequence Let N represent a normal packet length sequence of size γ and let S represent the corresponding stego packet sequence of the same size. Let size(N ) = γ and let P represent the packet length sequence, where P (i) is the ith packet length, i = 1, 2, . . . , γ . The adjacency histogram (Hld (P )) is defined as the number of times the two packets at a distance of l in the packet length sequence P have an absolute difference of d. So Hld (P ) = k implies that there are k instances where two packets at a distance l in the packet sequence P have an absolute difference of d (in packet lengths) between themselves. In this paper, Hld (N ) and Hld (S) are used to represent the adjacency histogram for the normal packets sample and the stego packets sample, respectively, and d and l are used to represent the terms difference and lag, respectively.

Circuits Syst Signal Process (2013) 32:1239–1256

1249

A step-by-step procedure is now given to formulate the distinguishing factor between cover and stego packet length sequence by analyzing the second order statistics: 1. Hld (N ) and Hld (S) is calculated for the normal and the stego packets sample, respectively, for difference(d) = 0, 1, 2, 3 . . . and lag(l) = 1, 2, 3, . . . . 2. The Normalized Adjacency Histogram Rld (P ) with difference (d) and lag (l) for the packet sequence P is defined by the following equation: Rld (P ) =

Hld . size(N )

(12)

3. The notation Rld (N ) and Rld (S), respectively, is used to represent the above Normalized Adjacency Histogram for the normal packets sample and the stego packets sample. 4. Rld (P ) is used as the proposed distinguishing statistic to separate out the normal and stego packets samples. 3.3 Feature Extraction and Classification 3.3.1 First Order Features The Packet Length Vector (χ ) is first determined for the HTTP packet length sequence. The COM and distortion metrics (δ) of Packet Length Vector (χ ) as discussed in Sect. 3.1.2 are taken as first order statistical features. 3.3.2 Second Order Features The averages of the Normalized Adjacency Histogram Rld (N ) and Rld (S) as defined in Sect. 3.2 with lag = 1, 2 and difference = 0 (i.e. R10 , R20 ) are taken as second order statistical features; these four features are all taken as second order statistical features. 3.3.3 Classifier First, the Fisher linear discriminant (FLD) is used to reduce the dimension of the feature space to a single dimension. Then the linear discriminant analysis (LDA) classifier [6] is trained using this single dimension feature.

4 Experimental Results 4.1 Experimental Setup 1. Dataset: In this paper, all experiments are done on the Clarknet dataset [5]. The Clarknet dataset contains the logs of 1.6 million HTTP queries for two weeks. All the queries are separated based on the client, and an HTTP request is performed for each of the clients using the Liping Ji algorithms [9, 10].

1250

Circuits Syst Signal Process (2013) 32:1239–1256

2. Evaluation Parameters for the Steganalytic Classifier: The detection performance of the steganalytic classifier is evaluated using the area under ROC (AROC) and detection accuracy (Pdetect ) of the receiver operating characteristics (ROC) plot. ROC plots are used to evaluate the performance of a binary classifier. A ROC curve is defined as a graphical plot of the sensitivity versus 1-specificity in signal detection theory as its discrimination threshold is varied. The ROC can also be represented equivalently by plotting the number of true positives (TPR = true positive rate) versus false positives (FPR = false positive rate). The detection accuracy (Pdetect ) is computed using Eqs. (13) and (14): Pdetect = 1 − Perror , (13) 1 1 Perror = × PFP + × PFN , (14) 2 2 where PFP , PFN are the probabilities of false positive and false negative, respectively, as discussed in [22]. A value of Pdetect = 0.5 shows that the classification is as good as random guessing, and Pdetect = 1.0 shows a classification with 100 % accuracy. 4.2 Detection of Liping Ji’s First Scheme Initially, first order statistical features derived from packet length sequence are used for detecting Ji’s first scheme [9]. Packet Length Vector (χ ) is calculated on normal network traffic which sends the genuine HTTP queries. The features as discussed in Sect. 3.1.2 are as follows: – The center of mass (COM) of the resultant Packet Length Vector (χ ). – Distortion metric (δ). These features are determined for both the normal traffic and stego (embedded) traffic (using Ji’s first scheme [9]). The FLD is used to reduce the two-dimensional feature space to a single-dimensional feature space. An LDA classifier has been trained using this reduced feature space. The ROC graph for Ji’s first scheme for different payloads is presented in Fig. 7. Payload = 20 implies that 20 % of packets are used for data embedding. The corresponding AROC and detection performance (Pdetect ) are tabulated in Table 2. It can be observed from Fig. 7 that AROC for the proposed detection scheme is very high (close to 0.5). This denotes a very high degree of accuracy for detecting the steganographic embedding. This inference is also supported by the results given in Table 2. Also, it is observed that the detection accuracy Pdetect is nearly 100 % for different payloads. Hence, it can be concluded that the proposed steganalysis can accurately detect Ji’s first scheme [9]. 4.3 Detection of Liping Ji’s Second Scheme As discussed in Sect. 3.3.2, the normalized adjacency histogram of normal (Rld (N )) and stego (Rld (S)) packet sequences with lag = 1, 2 and difference = 0 (i.e., R10 , R20 ) are taken as second order statistical features. In this section, initially it is shown

Circuits Syst Signal Process (2013) 32:1239–1256

1251

Fig. 7 The ROC curve for evaluating the detection performance of Liping Ji’s first scheme [9]

Table 2 Detection performance of the proposed detection scheme for Liping Ji’s first scheme [9]

Payload

Area under ROC

Pdetect

10 %

0.4827

0.9687

30 %

0.4950

1.0000

50 %

0.4975

1.0000

100 %

0.5000

1.0000

that these features are acting as distinguishing statistics for cover and stego traffic. Then the detection performance of the classifier, which is trained using these second order statistical features, is presented. 400 samples from the Clarknet dataset [5] are considered for the experiments. The reference is taken initially to be 100 % of the total normal dataset. 4.3.1 Distinguishing Statistics The second order statistical features with Reference size = 100 % obtained from packet length sequences for both normal traffic (Rld (N )) and stego traffic (Rld (S)) (averaged over 400 samples) have been plotted in Fig. 8 (with lag = 1 and difference = 0) and Fig. 9 (with lag = 2 and difference = 0), where the x axis denotes the packet sample number and the y axis denotes average Normalized Adjacency Histogram Rld (N ) and Rld (S). It is observed from Figs. 8 and 9 that the average of R10 (N ) and R20 (N ) for 400 reference samples is greater than that of R10 (S) and R20 (S), respectively.

1252

Circuits Syst Signal Process (2013) 32:1239–1256

Fig. 8 Comparison of R10 (N ) and R10 (S) for normal and stego traffic, Reference size = 100 %

Fig. 9 Comparison of R20 (N ) and R20 (S) for normal and stego traffic, Reference size = 100 %

Fig. 10 Comparison of R10 (N ) and R10 (S) for 400 samples with Reference size = 50 %

The same features obtained from packet length sequence for both normal traffic (Rld (N)) and stego traffic (Rld (S)) (averaged over 400 samples) with Reference size = 50 % have been plotted in Fig. 10 (with lag = 1 and difference = 0) and Fig. 11 (with lag = 2 and difference = 0). It can be observed from Figs. 10 and 11 that Rl0 (N ) and Rl0 (S) are significantly different in normal and stego traffic. The averages of the Normalized Adjacency Histogram Rld (N ) and Rld (S) (for all 400 samples) with lag = 1, 2 and difference = 0 for different reference sizes are tabulated in Table 3.

Circuits Syst Signal Process (2013) 32:1239–1256

1253

Fig. 11 Comparison of R20 (N ) and R20 (S) for 400 samples with Reference size = 50 % Table 3 Comparison of Rld (N ) and Rld (S) values for normal and stego packet samples

Reference

l = 1, d = 0

l = 2, d = 0

R10 (N )

R10 (S)

R20 (N )

R20 (S)

50 %

0.02516

0.0090

0.02274

0.00900

80 %

0.02516

0.0079

0.02272

0.00741

100 %

0.02405

0.00783

0.0223

0.00729

It can be observed from Table 3 that R10 (N ) and R20 (N ) (averaged over 400 samples) are always larger compared with R10 (S) and R20 (S). Thus these statistics can be used to distinguish between normal and stego packet length distributions. 4.3.2 Justification In the Liping Ji [10] scheme, a bucket is chosen based on the Wi ’s [refer to Sect. 2.2]. Since the distribution of Wi ’s is random (assuming secret bits are random), the buckets are chosen randomly. Again a packet is randomly chosen from the selected bucket, so the stego dataset is just a new combination of the normal packet sample. From the above facts, the stego flow can be imagined as a reordered version of the normal flow. This reordering is the combined effect of both the above randomnesses in selecting the packet lengths. The statistical anomalies due to this reordering are reflected by the proposed distinguishing statistics (Rld (P )) [refer to Sect. 3.2]. On comparing the ratios Rld (N ) and Rld (S) for different lag(l) and difference(d) values, the effects of the reordering can be traced in the stego packet flow and thus can be used for differentiating stego packet samples from the normal packet sample. 4.3.3 Detection Performance of the Proposed Scheme The Normalized Adjacency Histograms Rld (N ) and Rld (S) as defined in Sect. 3.2 with lag = 1, 2 and difference = 0 (i.e., R10 , R20 ) are taken as features to train the classifier. The FLD is used to reduce the two-dimensional feature space to a single-dimensional feature space. An LDA classifier has been trained using this reduced feature space. The generated ROC graph for the proposed detection scheme for Ji’s second scheme [10] for different payloads is shown in Fig. 12. It can be observed from Fig. 12 that the proposed scheme can detect the steganographic embedding using

1254

Circuits Syst Signal Process (2013) 32:1239–1256

Fig. 12 The ROC curve for evaluating the detection performance of Liping Ji’s second scheme [10] Table 4 Detection performance of the proposed detection scheme for Liping Ji’s second scheme [10]

Payload

Area under ROC

Pdetect

30 %

0.4461

0.9075

50 %

0.4858

0.9412

80 %

0.4961

0.9570

100 %

0.4964

0.9575

Ji’s second algorithm [10] for a relatively higher payload. The area under ROC and the detection accuracy (Pdetect ) are tabulated in Table 4 for different payloads. It can be observed from Table 4 that the proposed detection scheme can detect Ji’s second scheme [10] with considerable accuracy. 4.4 Comparison with Existing Steganalysis Schemes To the best of the authors’ knowledge, the proposed steganalysis scheme is the first attempt to detect network packet length based steganography. Most of the earlier attempts to detect network steganography (e.g., [1]) attack the specific steganography, e.g., using a header based approach. Thus, the main contribution of this paper is to propose a way to detect network packet length based steganography using statistical analysis of the network packet sequence. Moreover, it is mathematically shown in Sect. 2.3 that Liping Ji’s second scheme [10] will always produce a stego packet length distribution similar to the normal packet length distribution. This implies that the features taken from the first order statistics (Packet Length Vector) of the packet length distribution may not detect the

Circuits Syst Signal Process (2013) 32:1239–1256

1255

steganographic perturbations due to Liping Ji’s second scheme [10]. This theoretical analysis (refer to Sect. 2.3) dictates that we take second order statistical features to design a steganographic classifier for the detection of Liping Ji’s second scheme [10].

5 Conclusion In this paper, a detection scheme for network packet length based steganography has been proposed. Sets of features have been devised depending on the first and second order statistics of packet length distribution. The center of mass (COM) and distortion metric (δ) of the Packet Length Vector (χ) are taken as first order features, whereas the Normalized Adjacency Histogram Rld with lag(l) and difference(d) is taken as the second order feature to train a supervised learning based classifier. It is also experimentally shown that the proposed classifier can detect the presence of steganographic embedding using a set of recent packet length based schemes with a considerably high accuracy. It is observed that the classifier based on first order statistics can accurately detect Ji’s first scheme [9], whereas second order statistical features are needed to detect Ji’s second scheme [10]. The observed detection performance is also justified in Sect. 4. The proposed scheme is simple and can be easily extended to detect any other packet length based steganography.

References 1. K. Ahsan, D. Kundur, Covert channel analysis and data hiding in TCP/IP. MSc thesis, Dept. of Electrical and Computer Engineering, University of Toronto, August 2002 2. K. Ahsan, D. Kundur, Practical data hiding in TCP/IP, in ACM Workshop on Multimedia and Security, (2002). http://ee.tamu.edu/~deepa/pdf/acm02.pdf 3. S. Cabuk, C.E. Brodley, C. Shields, IP covert channel detection, in ACM Transaction on Information and System Security, vol. 12 (2009), pp. 22.1–22.29 4. S. Cabuk, C.E. Brodley, C. Shields, IP covert timing channels: design and detection, in Proceedings of the 11th ACM Conference on Computer and Communications Security (2004) 5. Clarknet dataset. http://ita.ee.lbl.gov/html/contrib/ClarkNet-HTTP.html 6. J. Fridrich, Feature-based steganalysis for JPEG images and its implications for future design of steganographic schemes, in Proceedings of the 6th International Workshop on Information Hiding (2004), pp. 67–81 7. C.G. Girling, Covert channels in LANs. IEEE Trans. Softw. Eng. SE-13(2), 292–296 (1987) 8. J. Harmsen, W. Pearlman, Steganalysis of additive noise modelable information hiding, in Proceedings of the Security and Watermarking of Multimedia Contents V, vol. 5020 (2003), pp. 131–142 9. L. Ji, W. Jiang, B. Dai, X. Niu, A novel covert channel based on length of messages, in Proceedings of the International Symposium on Information Engineering and Electronic Commerce (2009), pp. 551– 554 10. L. Ji, H. Liang, Y. Song, X. Niu, A normal-traffic network covert channel, in Proceedings of the International Conference on Computational Intelligence and Security 2009, vol. 1 (2009), pp. 499– 503 11. B.W. Lampson, A note on the confinement problem. Commun. ACM 16(10), 613–615 (1973) 12. W. Mazurczyk, M. Smolarczyk, K. Szczypiorski, Hiding information in retransmissions. CoRR, abs/0905.0363 (2009). http://arxiv.org/ftp/arxiv/papers/0905/0905.0363.pdf 13. S.J. Murdoch, J. Steven, Lewis, Embedding covert channels into TCP/IP, in Proceedings of the Information Hiding: 7th International Workshop. LNCS, vol. 3727 (2005), pp. 247–261 14. A.S. Nair, A. Sur, S. Nandi, Detection of packet length based network steganography, in Proceedings of the International Conference on Multimedia Information Networking and Security (MINES 2010) (2010), pp. 574–578

1256

Circuits Syst Signal Process (2013) 32:1239–1256

15. A.S. Nair, A. Sur, S. Nandi, Network steganography—a brief survey, in Proceedings of the National Workshop on Design and Analysis of Algorithms (2010) 16. M.A. Padlipsky, D.W. Snow, P.A. Karger, Limitations of end-to-end encryption in secure computer networks. Tech. Rep. ESD-TR-78-158, Mitre Corporation (1978) 17. R.O. Preda, D.N. Vizireanu, A robust wavelet based video watermarking scheme for copyright protection using the human visual system. J. Electron. Imaging 20, 013–022 (2011) 18. R.O. Preda, D.N. Vizireanu, A robust digital watermarking scheme for video copyright protection in the wavelet domain. Measurement 43(10), 1720–1726 (2010) 19. R.O. Preda, D.N. Vizireanu, Quantization based video watermarking in the wavelet domain with spatial and temporal redundancy. Int. J. Electron. 98(3), 393–405 (2011) 20. Y. Quan-zhu, Z. Peng, Coverting channel based on packet length. Comput. Eng. 34(3) (2008) 21. S.H. Sellke, C. Wang, S. Bagchi, N.B. Shroff, TCP/IP timing channels: theory to implementation, in Proceedings Infocom 2009 (2009), pp. 2204–2212 22. K. Solanki, A. Sarkar, B.S. Manjunath, YASS: yet another steganographic scheme that resists blind steganalysis, in Proceedings of the 9th International Workshop on Information Hiding (2007), pp. 16– 31 23. K. Szczypiorski, A performance analysis of HICCUPS—a steganographic system for WLAN. CoRR, abs/0906.4217 (2009). http://arxiv.org/abs/0906.4217 24. Z. Trabelsil, H. El-Sayed, L. Frikha, T. Rabiel, Traceroute based IP channel for sending hidden short messages, in Proceedings of the Advances in Information and Computer Security (2006), pp. 421–436 25. S. Zander, G. Armitage, P. Branch, Covert channels in the IP time to live field, in Proceedings of the Australian Telecommunication Networks & Applications Conference (ATNAC) (2006) 26. J. Zhang, I.J. Cox, G. Doerr, Steganalysis for LSB matching in images with high-frequency noise, in Proceedings of the IEEE 9th Workshop on Multimedia Signal Processing, MMSP 2007 (2007), pp. 385–388

Steganalysis of Network Packet Length Based Data Hiding

Recommend Documents