J Electron Test DOI 10.1007/s10836-014-5453-9
Low-Power Scan Testing: A Scan Chain Partitioning and Scan Hold Based Technique Efi Arvaniti & Yiorgos Tsiatouhas
Received: 20 December 2013 / Accepted: 12 May 2014 # Springer Science+Business Media New York 2014
Abstract Power consumption during scan testing operations can be significantly higher than that expected in the normal functional mode of operation in the field. This may affect the reliability of the circuit under test (CUT) and/or invalidate the testing process increasing yield loss. In this paper, a scan chain partitioning technique and a scan hold mechanism are combined for low power scan operation. Substantial power reductions can be achieved, without any impact on the test application time or the fault coverage and without the need to use scan cell reordering or clock and data gating techniques. Furthermore, the proposed design solution for scan power alleviation, permits the efficient exploitation of X-filling techniques for capture power reduction or the use of extreme (power independent) compression techniques for test data volume reduction. Keywords Scan testing . Design for test (DfT) . Low power scan
1 Introduction Among testing techniques, scan testing is a valuable solution for both built-in self test (BIST) and non-BIST (external) testing schemes. In a scan design the memory elements of a Responsible Editor: P. Girard This research has been co-funded by the European Union (European Social Fund) and Greek national resources under the framework of the “Thales” project of the “Education & Lifelong Learning” Operational Program. E. Arvaniti : Y. Tsiatouhas (*) Department of Computer Science and Engineering, University of Ioannina, Ioannina, Greece e-mail:
[email protected] E. Arvaniti e-mail:
[email protected]
circuit are dynamically configured as a shift register aiming to increase the controllability and observability of internal circuit nodes. There are two distinct phases in the scan operation, the shift phase where test data are shifted in/out the chain and the capture phase where the responses of the combinational logic are captured. Today, power consumption during integrated circuit testing procedures is a great concern since it can be several times higher than this during the normal mode of operation. This situation can affect the reliability or even cause the structural damage of the CUT due to overheat and electromigration phenomena [19]. In addition, the elevated temperature can degrade the speed performance of the CUT and result to erroneous test responses that will invalidate the testing process and lead to yield loss. Scan testing power consumption is an open issue. The excessive switching activity of the CUT during scan operations may violate the specification limitations on power supply IR and Ldi/dt drop, which in turn increases the probability of noise induced test failures. Various techniques have been proposed in the literature for the reduction of dynamic power dissipation during test application. Initially, a method to reduce power in sequential circuit testing is to decrease the test frequency [20, 28]; but the test time increases. Moreover, the power supply can be lowered during testing to further reduce power consumption [20]. To eliminate the dynamic power dissipation of the combinational logic during the shift operations in scan testing, data gating techniques at the outputs of the scan cells can be exploited [14]. The drawback in that case is the delay penalty during the normal mode of operation. Several scan cell local clock gating techniques have been proposed [1, 25, 32] for low power, although clock skew problems in the normal mode turn to be a major disadvantage. In addition, a method that uses two nonoverlapping clocks working at half of the initial frequency and feeding separate partitions of the scan chain is discussed in [5]. A dedicated power supply gating technique has been presented in [4] in order to avoid power consumption in the
J Electron Test
combinational logic. However, aiming to apply this scheme transistor level redesign of a large number of standard cells in a library is required. A simple approach for low power testing is the re-ordering of the used test vectors [8, 13, 15, 27] or the scan cells [11] to minimize the switching activity. Although test vectors re-ordering does not induce overhead in test application time, it is characterized by high computation times due to the problem complexity [20]. Moreover scan cell reordering may increase design and silicon area cost. A technique to deactivate part of the parallel scan chains that are not involved in a scan session during the capture phase and to feed them with constant values during the shift phase has been proposed in [12]. This approach may affect the coverage of un-modeled faults, while for the reduction of the scan-out switching activity local clock gating is required. Low power oriented test pattern generation techniques have been also proposed [29]. Equivalently, a practical solution is the reassignment of don’t care bits in the test cubes (X-filling) such that the switching activity is reduced [3, 6, 19, 21, 24, 31]. However, test cube bit re-assignment techniques usually cannot achieve the same amount of power reduction as hardwarebased techniques do [19]. In [26] a scan chain modification is introduced for scan power reduction by inserting logic gates in-between the scan cells. This approach requires large computational effort for the determination of the insertion points in large cores and depending on the CUT its effectiveness may be limited. A scan chain partitioning technique which uses multiplexers and scan cell re-ordering is presented in [18] in order to avoid scan power dissipation and reduce test application time. A partition is not involved in the scan-in/out operations in case that all cells in it have don’t care values in both the test vector and the corresponding response vector. The main drawback of this technique is the need for scan cell re-ordering in order to achieve acceptable results. In addition, the coverage of unmodeled faults may be significantly reduced. A similar topology for scan testing acceleration has been presented in [23]. The jump scan architecture has been proposed in [10]. Each flip-flop in the chain is modified so that its master and slave latches are working as two independent latches during the scan mode of operation, with the use of an additional multiplexer in-between them. The scan-in/out power is reduced since half of the slave latches are bypassed for each bit that
is shifted in the scan chain. A major disadvantage of this approach is that the above modification reduces the speed performance of the functional circuit during the normal mode of operation. Another low power scan chain partitioning technique has been proposed in [9]. In this work, the parallel scan chains (segments) of an Illinois topology, except one that is used as reference, are divided to an equal number of partitions with the use of multiplexers. Initially, input test data are scanned-in only to the reference segment while the rest segments are loaded with zeros. In parallel all segments upload the response data they store. Finally, during the shift operations at the last partition of the reference segment, all partitions in the rest segments are fed with the same data that feed the corresponding partition in the reference segment. The main limitation of this approach is that it is capable to reduce only the scan-in power but not the scan-out power in a chain. Moreover, it is applicable only to Illinois-based scan chains. In [22] an efficient technique has been proposed that is capable to reduce the scan-out power. The concept of the reference segment from [9] is exploited. Again, the scan chains are partitioned and extra compactors are inserted in-between the partitions. The compacted response test data are scanned-out through the reference segment for power reduction. By combining the two techniques in [9] and [22], as proposed in [22], both scan-in and scan-out power reductions can be achieved. However, the applicability of the combined version is still limited to Illinois-based designs. Recently, in [16, 17] a scan segmentation technique for low power testing has been presented, which is supported by scan freeze flip-flops and a proper status register. Before test vector insertion the status register is loaded. Test data are shifted through a single segment (or group of segments) under the control of the status register, while the rest segments remain “frozen”. This scheme increases the test time since it requires extra clock cycles per test vector to scan-in configuration data in the status register, especially when the target is not to decrease the fault coverage of un-modeled faults. Furthermore, this technique is not BIST compliant. Finally, in [33] and [7] two-stage scan architectures are presented, where flip-flops are included as “leaf cells” in a scan chain for test application time reduction; limited scan power reductions are reported as a side effect. A scan chain partitioning technique for low power scan testing is presented in this work. It is suitable for BIST and
Scan-In
Fig. 1 a Original scan chain and b) Scan chain partitioning and shift operation
(a)
Scan-In
(b)
Length L/p
Scan-Out
Original Scan Chain Scan Chain Length L 0
0
1
1
“0” MODE1
(p)
...
“1” MODE2
0
Scan-Out
1
“0” MODEp
J Electron Test MODEj
Fig. 2 Scan cell with hold mode of operation
CLK Scan-Hold FF
Normal Mode Scan_EN=”low”
Scan FF CLK
Scan_Ini 1
D
Q Flip-Flop (FF)
Scan_Outi
Scan (Shift) Mode Scan_EN=”high”
0
From Logic
2 Low Power Scan Architecture During scan testing operations, the dynamic power consumption depends on the switching activity (transitions) of the scan chain, the combinational logic and the clock distribution tree.
Fig. 3 Partitioning of multiple parallel scan chains (segments)
0
0
1
1
MODE1
Partition Cluster 2
1
MODE2
0
0
1
1
MODE1
MODE2
0
. (p) .. Partition Cluster p
. (p) ..
MODEp
0
1
...
Test Source
Partition Cluster 1
Thus, circuit switching activity is commonly used for power dissipation estimation. The target in low power oriented scan testing techniques is the switching activity of the combinational logic. The weighted transition count (WTC) metric [8, 24] is a well known and widely acceptable power consumption estimation tool in scan chain based designs. According to this metric, the power consumption for a given vector depends on the number of subsequent bit transitions in it and the relative position of these transitions. The WTC at the pseudo-primary inputs of the combinational logic during the scan-in/out process is strongly correlated to the pertinent switching activity on the internal nodes of the CUT [8, 24]. Consequently, the higher the WTC of the scanned-in test vector or the scanned-out response vector, the higher the power consumption in the CUT. Next, let us consider a scan chain of length L. Moreover, assume a test vector tj =(tj,1, tj,2, … tj,L), where tj,s+1 is scannedin before tj,s and so on, and the corresponding response vector rj-1 =(rj-1,1, rj-1,2, … rj-1,L), in the scan chain, from the application of the previous test vector tj-1, where rj-1,s+1 is scanned-out before rj-1,s and so on. The total scan chain WTC for the
MODEp
. (p) ..
0
(s)
0
0
1
1
1
MODE1
MODE2
MODEp
Test Sink
non-BIST test environments to effectively reduce/adjust scan power dissipation down to the required levels. By exploiting the proposed scheme, low power oriented scan cell reordering, clock, data or power supply gating techniques are avoided. In addition, the new technique can be efficiently combined, a) with existing X-filling techniques to reduce capture power and b) test data compression techniques for test data volume reduction. A preliminary, poster, version of the work has been presented in [2]. The paper is organized as follows. In Section 2, the proposed scan chain architecture is introduced and its operation is analyzed. In Section 3, comparisons with existing low power scan solutions are discussed. Next, in Section 4, experimental results from the application of the new technique on benchmark circuits are presented in order to validate its low power efficiency. Finally, in Section 5 the conclusions are drawn.
To Logic
Scan_EN
J Electron Test MODE1
Fig. 4 The Mode-Register (at the initial state)
MODE2
SET Q
“0”
MODE(p-1)
SET D
CLK
Q
“0”
CLR
D
CLK CLR
Q
“0”
Mode Register
Scan_EN
SET
SET
(p)
...
MODEp
D
CLK
Q
“1”
D
CLK
CLR
MR-CLK MR-CD
combination of these two vectors, during the pertinent scan-in/ out session, is given by the following expression: L−1 X WTC ¼ L t j;l ⊕r j−1;1 þ i t j;i ⊕t j;iþ1 i¼1
þ
L−1 X
ðL−iÞ r j−1;i ⊕r j−1;iþ1
ð1Þ
i¼1
The idea behind the proposed low power scan operation is the partitioning of the scan chain and the application of the scan-in/out operations in each partition separately while the rest partitions remain stable in a hold mode of operation. This way, during the scan shift operations, the number of signal transitions at the pseudo-primary inputs of the combinational logic (inputs driven by scan-cells) is drastically reduced and consequently the same stands for the power consumption. 2.1 Scan Chain Structure and Operation The original and the proposed scan chains are presented in Fig. 1. The scan chain is partitioned and multiplexers are placed in-between each partition. Setting to “high” the select input (signal MODEj) of a single multiplexer at the output of a partition, while the select inputs of the rest multiplexers are kept to “low”, we permit scan-in/out operations only to this particular partition (active partition) while the rest partitions are bypassed (Fig. 1b). However, by performing scan-in/out operations on a single partition the data in the rest partitions are corrupted, unless the corresponding flip-flops are set in a hold mode of operation to retain their data. A simple way to achieve this hold mode of operation is to block the clock signal in the pertinent partitions using local clock gating techniques like in [32]. In each partition a single AND gate is inserted in the clock distribution network. This gate is fed by the clock signal and the MODEj signal of the corresponding multiplexer in order to block the clock. In the normal mode of operation all MODEj signals are “high”. Although this is a low cost approach, it is not a preferable design style due to clock skew related problems that local clock gating techniques insert in the clock distribution network. We adopt an alternative solution for the hold mode of operation where the stored data in a scan flip-flop re-feed its input.
In Fig. 2 a scan flip-flop with a re-feeding mechanism to implement the hold mode of operation is presented. The used three-state buffers are controlled by the select signal MODEj of the multiplexer at the output of the corresponding partition where the flip-flop belongs. Since these buffers are not on critical paths for the circuit normal operation, they can be replaced by simple pass-gates to reduce the pertinent cost. The operation of the proposed scan chain is as follows. Let us consider that the scan chain is divided into p partitions. In the scan mode, the partitions are successively activated to perform the required scan-in/out operations. Initially the rightmost partition is activated (though in general the order of partition activation can be random) that is MODEp =“1” and MODEj =“0” ∀ j
CLK
C-Counter P-Counter MR-CLK
Mode-Register
...
Reset P-Counter Initialize Mode-Register
Reset C-Counter Full count cycle of C-Counter - Shift Phase Increment P-Counter P-Counter
p
yes
no
Capture Phase
MODEj signals Next test vector?
yes
no
End
(a)
(b)
Fig. 5 a Mode-Register clock generation and b) operation flowchart
J Electron Test Shift Phase 2
...
MR-CLK Scan_EN
Shift Phase p
... L/p cycles
Shift Phase 1
...
Capture
CLK
...
...
Shift Phase 1
Fig. 6 Signal waveforms
Following Shift Phases
Shift Phases Test Vector Insertion & Response Extraction
MR-CD L cycles
Fig. 7 The proposed scan chain partitioning scheme in a BIST topology
Scan_EN=“high”), a single pulse is shifted in the ModeRegister (from right to left) to successively activate the partition clusters. In addition, two counters can be used to automate the shifting of the pulse in the Mode-Register (see Fig. 5a). The first counter (P-Counter) is log2(p) bits wide and counts the number of partitions. The second counter (CCounter) is log2(L/p) bits wide (where L is the length of each segment) and counts the number of cells in each partition. Each time the P-Counter is incremented, the Mode-Register is shifted one position for the activation of a new cluster. Then, the C-Counter performs a complete count cycle (L/p counts) and provides the necessary cycles for the completion of the scan-in/out operation in the activated cluster. Next, the CCounter is nullified and the P-Counter is triggered to activate the next cluster and so on until the completion of the scan-in/ out operations in all partition clusters. So far, the new test vector has been applied, the previous response vector has been extracted and the P-Counter has accomplished a complete count cycle (p counts). Afterwards, the Scan_EN signal is turned to “low” and sets a) the circuit into the normal mode of operation in order to capture the response data of the applied test vector and b) the Mode-Register to the all “high” state. Then, both counters are nullified and the Mode-Register is initialized.
0
1
1
1
MODE1
MODE2
MODEp
0
0
1
1
MODE1
MODE2
0
0
1
1
1
MODE1 BIST C&P Counters Controller
MODE2
MODEp
TPG
0
. (p) ..
0
0
. (p) ..
1
...
(s)
. (p) ..
Mode-Register
MODEp
0
ORA
response vector has been extracted from the scan chain. Thus, the fault coverage is not affected. Moreover, the number of clock cycles required for the completion of the scan-in/out operations in the whole scan chain, according to the proposed design technique, is exactly the same to the corresponding number of clock cycles required in the original scan chain. Thus the test application time is not increased. Obviously, the proposed technique can be extended to multiple parallel scan chains (segments), as it is shown in Fig. 3 where s segments are considered. In all segments, partitions with identical partition numbers (a partition cluster) are activated in parallel. Again, the number of clock cycles required for the completion of the scan-in/out procedures in all segments is exactly the same to the corresponding number in the standard design. The capture phase is not affected. In order to realize the new scan chain architecture in Fig. 3, a dedicated control signal MODEj is required for each partition cluster. For the generation of these signals during the scan mode of operation, a p-bit auxiliary shift register is exploited. We will call this register Mode-Register (see Fig. 4). The Mode-Register is initialized (by sequentially setting the Scan_EN (scan enable) and MR-CD (clear) signals at “low”) to the all zero state, except the rightmost cell that is set to “high”. During the scan mode of operation (where
J Electron Test Fig. 8 Partial implementation of the proposed scan chain partitioning scheme
Modified Scan Chain
0
0
. (p) ..
0
1
1
1
MODE1
MODE2
MODEp
... ... 0
. (p) ..
0
1
1
1
MODE1
MODE2
MODEp
Test Sink
0
...
Test Source
Standard Scan Chain
... ... 0
0
2.2 Partial Implementation Either in case that the silicon area requirements of the proposed scan chain architecture are considered quite high or the power reduction is higher than this required by the specifications for test efficiency, a partial implementation can be alternatively adopted. In Fig. 8, the partial implementation of the new scheme is illustrated. Among the parallel scan chains of Fig. 9 Launch-on-capture at speed testing signal waveforms
1
1
1
MODE2
MODEp
the design only a portion (selected using cost and power criteria) is modified according to the proposed technique. The rest of the chains remain standard scan chains utilizing the standard scan flip-flop. The operation of this scheme follows exactly the procedures discussed in Figs. 5b and 6. The operation of the standard scan chains is not affected by these procedures, since the scan cycles are exactly the same as in the original scan chain, and the scan-in/out process is performed in the traditional way.
2.3 At Speed Scan Testing In modern nanometer technology designs, at speed scan testing techniques are mandatory [30]. The proposed architecture supports at speed testing either in its full or partial implementation without any extra cost. In Fig. 9, signal waveforms for the application of the well known launch-on-capture at speed testing technique on the proposed topology are presented. After L cycles for the scan-in of the new vector (initialization vector) and the scan-out of the previous test response the scan enable signal (Scan_EN) is deactivated and a pair of fast pulses is applied to launch the test vector and capture the test response respectively. Note that the deactivation of the scan
Shift Phase 1
CLK
...
Shift Phase 2
...
MR-CLK Scan_EN
0
MODE1
...
The above procedure is repeated until the application of all desired test vectors, as it is presented in the flowchart of Fig. 5b. Note that in the normal mode of operation the Mode-Register remains stable to the all “high” state (Scan_EN=“low”). In Fig. 6 the signal waveforms for the operation of the proposed scan chain architecture are illustrated. The above design for testability circuitry is embedded in the CUT and can be controlled either by an external automatic test equipment (ATE) or a BIST controller. In case that a BIST solution is utilized, the overall scheme is shown in Fig. 7, where the two counters and the Mode-Register are included in the BIST controller along with the Test Pattern Generator (TPG) and the Output Response Analyzer (ORA).
. (p) ..
Shift Phase p
...
Shift Phase 1
...
L/p cycles
Shift Phases Test Vector Insertion & Response Extraction
MR-CD L cycles
Following Shift Phases
J Electron Test 0
1
1
MODE1
Partition Cluster 2
0
. (p) ..
1
MODE2
MODEp
Partition Cluster p
1
0
0
0
1
1
S/B
MODE1
MODE2
1
0
0
0
1
1
1
S/B
MODE1
MODE2
MODEp
enable signal sets the Mode-Register to the all high state (see Fig. 4). Thus, the scan chains are set to the normal mode of operation for the application of the pair of launch and capture pulses. Then, a clear operation is performed on the ModeRegister (MR-CD=”low”) and another session of L cycles follows for the next pattern insertion and so on until the completion of the test set. 2.4 Illinois Scan Implementation The Illinois scan chain is a well known scan architecture for test application time reduction [30]. The Illinois scan architecture can be easily applied on the proposed scan testing scheme according to Fig. 10. As in the standard scan chain, simple multiplexers are inserted at the input of each parallel scan chain (except the first one). These multiplexers are controlled by the serial/broadcast selection signal S/B. In the broadcast mode all scan chains are fed in parallel with test data, while in the serial mode all scan chains form a single scan chain to serially scan-in test data. Either in the serial or the broadcast mode of operation, only the scan partitions of the active cluster are in the shift mode of operation, while the rest partitions are in the hold mode. Thus, the same as earlier WTC reduction can be achieved.
0
. (p) ..
1
...
Test Source
Partition Cluster 1
0
MODEp
. (p) ..
0
Test Sink
Fig. 10 Illinois version of the proposed scan chain design technique
(s)
In a non-BIST environment and particularly in deterministic testing, the proposed scan architecture can be easily combined with X-filling techniques [3, 6, 19, 21, 24, 31] to reduce further both the shift and the capture power consumption during scan testing. However, since in our case the shift power is significantly alleviated, X-filling effort can be exclusively devoted to reduce the capture power (e.g. with the use of the preferred fill technique [8]). Furthermore, test data compression techniques can be more effective without the need to consider low power issues during the scan operations (as it is stated in [8]), given that the proposed technique is exploited. Last but not least, we have to mention that the new scan architecture: a) does not require the re-ordering of the scan cells in the chain (as it is the case in [11]), b) does not affect the fault coverage of either modeled or un-modeled faults (as it is the case in [12, 16–18, 23]), c) does not increase the test application time (as it is the case in [16, 17, 28]), d) does not require the use of clock gating techniques (as it is the case in [1, 12, 25, 32]), e) has negligible influence on the speed performance and the power consumption of the functional circuit during the normal mode of operation (which is not the case in [4, 10, 14]), f) it is compliant with the launch-onTable 1 Benchmark circuits’ characteristics
3 Comparisons As mentioned earlier the proposed low power scan architecture can be applied either in BIST or non-BIST scan-based testing environments. In the BIST case and especially when pseudorandom test vectors are used, software based techniques, like Xfilling [3, 6, 19, 21, 24, 31], for low power scan operation, are not applicable. Other effective hardware-based approaches, like [16, 17], are not compatible with BIST techniques. The design for testability scheme introduced in this work is an efficient solution for low power scan-based BIST.
Benchmark
Inputs
Outputs
# Gates
# Flip-Flops
s38417 s38584 usb_funct tv80 systemcaes pci_bridge32 aes_core ac97_ctrl ethernet wb_conmax
28 12 112 13 258 158 258 54 93 1,128
106 278 98 32 129 164 129 47 105 1,416
22,179 19,523 20,980 12,031 16,340 36,665 26,620 23,445 94,428 54,151
1,636 1,452 1,746 359 670 3,359 530 2,199 10,544 770
J Electron Test Table 2 Experimental results on WTC and implementation cost Circuit
Segments (s)
Partitions (p)
Weighted transition count (WTC)
WTC reduction (%)
Size (Unit transistors)
Size increment (%)
s38584
1
s38417
1
tv80
1
1 2 3 4 6 8 10 1 2 3 4 6 8 10 1 2 3
82,994,860 41,689,564 27,477,052 20,839,276 13,854,044 10,397,662 8,603,392 51,059,343 25,016,079 16,728,231 12,338,895 8,597,127 6,283,599 5,468,999 59,249,424 29,617,385 19,771,752
– 49.77 66.89 74.89 83.31 87.47 89.63 – 51.01 67.24 75.83 83.16 87.69 89.29 – 50.01 66.63
296,411 306,223 306,334 306,545 306,667 306,889 307,111 281,208 292,137 292,248 292,359 292,581 292,803 293,025 132,216 135,321 135,432
– 3.31 3.35 3.42 3.46 3.53 3.61 – 3.89 3.93 3.97 4.04 4.12 4.20 – 2.35 2.43
4 6 8 10 1 2 3 4 6 8 10 1 2 3 4 6 8 10 1
1,478,858 9,864,342 7,361,674 5,876,482 86,974,932 43,483,662 28,877,124 21,725,064 14,477,760 10,868,644 8,688,738 67,120,663 33,562,373 22,382,145 16,804,567 11,201,739 8,411,325 6,717,617 33,562,477
75.04 83.35 87.58 90.08 – 50.00 66.80 75.02 83.35 87.50 90.01 – 50.00 66.65 74.96 83.31 87.47 89.99 –
135,543 135,765 135,987 136,209 251,676 255,849 255,960 256,071 256,293 256,515 256,737 198,285 203,323 203,434 203,545 203,767 203,989 204,211 198,285
2.52 2.68 2.85 3.02 – 1.66 1.70 1.75 1.83 1.92 2.01 – 2.54 2.60 2.65 2.76 2.88 2.99 –
2 3 4 6 8 10 1 2 3 4 6 8 10
16,793,451 11,200,701 8,404,071 5,555,883 4,210,854 3,363,859 246,732,620 123,214,610 82,424,944 61,749,906 41,165,288 30,895,414 24,698,654
49.96 66.63 74.96 83.45 87.45 89.98 – 50.06 66.59 74.97 83.32 87.48 89.99
203,275 203,399 203,523 203,771 204,019 204,267 543,961 549,614 549,725 549,836 550,058 550,280 550,502
2.52 2.58 2.64 2.77 2.89 3.02 – 1.04 1.06 1.08 1.12 1.16 1.20
aes_core
1
Systemcaes
1
2
wb_conmax
1
J Electron Test Table 2 (continued) Circuit
usb_funct
Segments (s)
Partitions (p)
Weighted transition count (WTC)
WTC reduction (%)
Size (Unit transistors)
Size increment (%)
2
1 2 3
123,164,610 61,771,306 41,173,260
– 49.85 66.57
543,961 549,566 549,690
– 1.03 1.05
4 6 8 10 1 2 3 4 6 8 10 1 2 3 4 6 8 10
30,827,041 20,578,381 15,447,765 12,341,390 1,758,690,097 879,882,901 586,870,395 438,607,537 293,389,861 219,537,221 176,967,943 889,882,901 438,672,720 293,389,861 219,618,692 146,223,379 110,028,278 88,224,106
74.97 83.29 87.46 89.98 – 49.97 66.63 75.06 83.32 87.52 89.94 – 50.70 67.03 75.32 83.57 87.64 90.09
549,814 550,062 550,310 550,558 317,254 328,850 328,961 329,072 329,294 329,516 329,738 317,254 328,802 328,926 329,050 329,298 329,546 329,794
1.08 1.12 1.17 1.21 – 3.66 3.69 3.73 3.80 3.87 3.94 – 3.64 3.68 3.72 3.80 3.87 3.95
1 2 3 4 6 8 10 1 2 3 4 6 8 10 1 2 3 4 6
773,239,939 386,869,007 257,813,397 193,555,369 129,036,379 96,601,816 77,290,810 257,575,439 129,017,280 85,922,985 64,501,106 42,992,124 32,235,114 25,775,230 128,669,163 64,323,563 42,878,171 32,141,187 21,466,981
– 49.97 66.66 74.97 83.31 87.51 90.00 – 49.91 66.64 74.96 83.31 87.49 89.99 – 50.01 66.68 75.02 83.32
371,362 385,701 385,812 385,923 386,145 386,367 386,589 371,362 385,635 385,772 385,909 386,183 386,457 386,731 371,362 385,639 385,815 385,991 386,343
– 3.86 3.89 3.92 3.98 4.04 4.10 – 3.84 3.88 3.92 3.99 4.06 4.14 – 3.70 3.75 3.79 3.88
8 10 1 2 3 4 6 8 10
16,070,959 12,847,187 5,685,778,630 2,842,307,479 1,894,316,972 1,420,891,234 947,447,111 710,330,302 568,366,902
87.51 90.02 – 50.01 66.68 75.01 83.34 87.51 90.00
386,695 387,047 581,273 602,617 602,728 602,839 603,061 603,283 603,505
3.97 4.05 – 3.67 3.69 3.71 3.75 3.79 3.82
1
2
ac97_ctrl pci_bridge32
1
3
6
1
J Electron Test Table 2 (continued) Circuit
Segments (s)
Partitions (p)
Weighted transition count (WTC)
WTC reduction (%)
Size (Unit transistors)
Size increment (%)
3
1 2 3
1,895,546,478 948,116,798 631,943,348
– 49.98 66.66
581,273 602,552 602,689
– 3.66 3.68
4 6 8 10 1 2 3 4 6 8 10 1 2 3 4 6 8 10
484,039,118 315,964,926 237,986,078 189,521,262 129,036,209 64,695,882 43,250,832 32,514,183 21,840,637 16,445,271 13,222,154 226,883,623,870 113,439,283,342 75,631,946,582 56,727,199,414 37,816,570,898 28,362,485,009 22,690,298,350
74.46 83.33 87.44 90.00 – 49.86 66.48 74.80 83.07 87.26 89.75 – 50.00 66.66 75.00 83.33 87.50 90.00
602,826 603,100 603,374 603,648 581,273 602,556 602,732 602,908 603,260 603,612 603,964 1,896,115 1,960,691 1,960,802 1,960,913 1,961,135 1,961,357 1,961,579
3.71 3.76 3.80 3.85 – 3.53 3.56 3.59 3.64 3.70 3.76 – 3.41 3.41 3.42 3.43 3.44 3.45
16 1 2 3 4 6 8 10 16 1 2 3 4 6 8 10 16
14,181,438,786 56,727,197,274 28,362,483,510 18,908,283,458 14,181,438,786 9,458,756,230 7,090,674,626 5,667,707,024 3,546,178,075 28,412,493,510 14,181,438,786 9,454,195,678 7,092,509,100 4,733,023,768 3,545,582,826 2,836,548,241 1,883,684,671
93.75 – 50.00 66.67 75.00 83.33 87.50 90.01 93.75 – 50.09 66.73 75.04 83.34 87.52 90.02 93.37
1,962,245 1,896,115 1,960,621 1,960,771 1,960,921 1,961,221 1,961,521 1,961,821 1,962,721 1,896,115 1,960,651 1,960,853 1,961,055 1,961,459 1,961,863 1,962,267 1,963,479
3.49 – 3.40 3.41 3.42 3.43 3.45 3.47 3.51 – 3.40 3.41 3.42 3.45 3.47 3.49 3.55
6
Ethernet
1
4
8
capture technique for at-speed testing and g) can be easily applied to Illinois-based scan topologies for power reduction in both the serial and the broadcast modes of operation.
4 Experimental Results The IWLS05, as well as two large ISCAS89 (s38417 and s38584), benchmark circuits were used for the evaluation of
the proposed technique. Table 1 provides the characteristics of these circuits. The general architecture in Fig. 3 was considered. Depending on the number of flip-flops, the scan chain is divided into segments (up to eight) and each segment to partitions (up to sixteen). All scan cells support the hold mode of operation like in Fig. 2. The test vectors used in the experiments have been extracted using the ATALANTA tool, where the random fill option was activated for X-bit assignment.
J Electron Test ethernet circuit
Fig. 11 WTC reduction with respect to s and p parameters for the Ethernet
The experimental results are provided in Table 2. The first column presents the circuit under consideration. The second and the third columns provide the number of segments and partitions respectively used for the implementation of the scan architecture. The rows where the number of partitions is equal to one (p=1) refer to the original scan design. In the fourth column the scan-in/out WTC for the application of the whole test set to the CUT is given, while the fifth column shows the percentage reduction in the WTC for each configuration (s and p combination, where p>1) with respect to the original scan design (p=1). The circuit size (expressed by the required equivalent number of minimum/unit size transistors) in each configuration, is provided in the sixth column of the table. Finally, in the seventh column the percentage increase in the circuit size is presented. As it is expected, the WTC reduction achieved is proportional to the number of the partitions in the design. The same stands for the average WTC. In Fig. 11, an indicative 3-D graph of the WTC reduction with respect to the number of segments and partitions in the Ethernet circuit is presented. In general, the experiments showed that the WTC reduction is almost independent of the CUT, the test vectors’ application order and the ordering of the scan cells in the chain. An interesting parameter in low power scan testing is the peak WTC per test vector. According to the experimental results, the peak WTC reduction during the shifting operations is also proportional to the number of the scan chain partitions. In Fig. 12, 3-D graphs are provided to illustrate the reduction of the peak WTC with respect to the number of partitions and the number of the segments in the design, for the three IWLS05circuits with the highest flip-flop count. The same trend stands for all the rest circuits in Table 2.
Fig. 12 Peak WTC reduction with respect to s and p parameters
J Electron Test Table 3 Comparison results Scan chain architecture
Proposed
[5]
[15]
[26]+[27]
[1]
Area overhead Performance degradation Test application time increase Un-modeled faults coverage degradation
1 4 8 16 16 32 64 64 128 256 <4 % NG NO NO
1 1 1 1 16 16 16 64 64 256 NG NO NO NO
1 1 1 1 ≤16 ≤16 ≤16 ≤64 ≤64 ≤256 NG NO NO YES
1 1 1 1 2.29 2.91 3.37 4.27 5.57 8.26 NA NG NO YES
<2 <2 <2 <2 <8 <8 <8 <16 <16 <256 NO NO NO YES
Clock skew problems Test data volume increase
NO NO
YES NO
YES YES
NO NO
NO NO
WTC reduction
Segments (s)
Partitions (p)
1 1 1 1 4 4 4 8 8 16
1 4 8 16 4 8 16 8 16 16
NG negligible, NA not available
Among the various scan power reduction techniques discussed in Section 1, these related to data [14] and power [4] gating schemes are the most effective since they are capable to eliminate the signal transitions in the combinational logic. However, the above techniques seriously affect the circuit performance in the normal mode of operation. Next, scan chain partitioning techniques are the most effective. In more details, we observe the following. According to the experimental results in Table 2, the WTC reduction of the proposed scan testing technique is proportional to the product of the number of segments (s) and the number of partitions (p) in each segment (s.p times reduction with respect to the case of a single segment without partitions). In clock gating techniques, like in [32], the WTC reduction is proportional to the square of the number of segments in the design (s2 times reduction). The scan enable gating technique [12] (possibly combined with clock gating) provides a theoretical maximum WTC reduction which is also proportional to the square of the number of segments in the design (s2 times reduction). However, this reduction depends on the applied test data and the corresponding test response for each circuit under test and it is certainly less than this upper bound. The combined use of the techniques presented in [9] and [22], as it is proposed in [22] for both scan-in and scan-out power minimization, provides WTC reductions that are proportional to s.p/(s+p-1) [22] (as the experimental results also prove). Finally, note that for the X-filling technique in [19], which is a test data dependant technique, the maximum reported reduction for the ISCAS89 and ITC99 benchmark circuits is 1.72 s times (the average reduction for the circuits used is 1.37 s times).
Table 3 presents concentrated comparison results for the five techniques above, with respect to: the WTC reduction, the cell area overhead, the performance degradation in the normal mode, the test application time increase, the degradation of the coverage of un-modeled faults, the increase of clock skew related problems and the increase of the test data volume. Finally, note that the implementation cost due to the additional circuitry used in the proposed scheme is small and only slightly increases with the number of partitions (p). However, in the above cost, the routing cost of the MODEj signals must be also included, which forms the main drawback of the proposed technique.
5 Conclusion Low power scan operation is very important for circuit reliability during and after manufacturing testing. In this paper a scan partitioning technique is proposed, along with a scan hold mechanism, to significantly reduce the power dissipation during the shift phase in a scan chain. The proposed approach is applicable in both BIST and non-BIST based test schemes and can be easily combined with X-filling methodologies to reduce also the power consumption in the capture phase. It is test data compression friendly and can alleviate the effort of defect oriented X-filling techniques [3]. The test application time and the fault coverage are not affected by the application of the proposed technique. Furthermore, no scan cell
J Electron Test
reordering is required, neither the use of clock, data or power supply gating techniques.
References 1. Almukhaizim S, Alsubaihi S, Sinanoglou O (2010) On the application of dynamic scan chain partitioning for reducing peak shift power. Springer J Electron Test Theory Appl 26:465–481 2. Arvanity E, Tsiatouhas Y (2012) Low power scan by partitioning and scan hold, IEEE Symp. on Design and Diagnostics of Electronic Circuits and Systems, pp 262–265 3. Balatsouka S, Tenentes V, Kavousianos X, Chakrabarty K (2010) Defect aware x-filling for low-power scan testing, IEEE/ACM Design Automation and Test in Europe Conference, pp 873–878 4. Bhunia S, Mahmoodi H, Ghosh D, Mukhopadhyay S, Roy K (2005) Low-power scan design using first level supply gating. IEEE Tran VLSI Syst 13(3):384–395 5. Bonhomme Y, Girard P, Guiller L, Landrault C, Pravossoudovitch S (2001) A gated clock scheme for low power scan testing of logic ICs or embedded cores, IEEE Asian Test Symposium, pp 253–258 6. Butler K, Saxena J, Fryars T, Hetherington G, Jain A, Lewis J (2004) Minimizing power consumption in scan testing: pattern generation and DFT techniques, IEEE Int. Test Conference, pp 355–364 7. Chalkia M, Tsiatouhas Y (2012) The leafs scan-chain for test application time and scan power reduction, IEEE Int. Conference on Electronics, Circuits and Systems, pp 749–752 8. Chandra A, Chakrabarty K (2002) Low-power scan testing and test data compression for System-on-Chip. IEEE Tran CAD Integr Circ Syst 21(5):597–604 9. Chandra A, Ng F, Kapur R (2008) Low power Illinois scan architecture for simultaneous power and test data volume reduction, IEEE/ACM Design Automation and Test in Europe Conference, pp 462–467 10. Chiu M-H, Li J C-M (2005) Jump scan: a DFT technique for low power testing, IEEE VLSI Test Symposium, pp 277–282 11. Chosh S, Basu S, Touba N (2003) Joint minimization of power and area in scan testing by scan cell reordering, IEEE Comp Soc Annu Symp VLSI, pp 246–249 12. Czysz D, Kassab M, Lin X, Mrugalski G, Rajski J, Tyszer J (2008) Low power scan shift and capture in the EDT environment, IEEE International Test Conference, p 13.2 13. Dabholkar V, Chakravarty S, Pomeranz I, Reddy SM (1998) Techniques for minimizing power dissipation in scan and combinational circuits during test application. IEEE Tran CAD Integr Circ Syst 17(12):1325–1333 14. Gerstendorfer S, Wunderlich H (2000) Minimized power consumption for scan based BIST. J Electron Test Theory Appl 16(3):203–212 15. Girard P, Guiller L, Landrault C, Pravossoudovitch S (1999) A test vector ordering technique for switching activity reduction during test application, IEEE Great Lakes Symp. on VLSI, p 24–27 16. Kim H-S, Kang S, Hsiao M (2008) A new scan architecture for both low-power testing and test volume compression under SoC test environment. Springer J Electron Test Theory Appl 24:365–378 17. Kim H-S, Kim C-G, Kang S (2008) A new scan partition scheme for low-power embedded systems. ETRI J 30(3):412–420 18. Lee I-S, Hur Y-M, Ambler T (2004) The efficient multiple scan chain architecture reducing power dissipation ant test time, IEEE Asian Test Symposium, pp 94–97 19. Li J, Xu Q, Hu Y, Li X (2010) X-filling for simultaneous shift- and capture-power reduction in at-speed scan-based testing. IEEE Trans VLSI Syst 18(7):1081–1092
20. Nicolici N, Al-Hashimi B (2003) Power-constrained testing of VLSI circuits, Kluwer Academic Publishers 21. Remersaro S, Lin X, Zhang Z, Reddy S, Pomeranz I, Rajski J (2006) Prefered fill: a scalable method to reduce capture power for scan based designs, IEEE International Test Conference, p 32.2 22. Saeed SM, Sinanoglou O (2011) Expedited response compaction for scan power reduction, IEEE VLSI Test Symposium, pp 40–45 23. Samaranayake S, Sitchinava N, Kapur R, Amin MB, Williams TW (2002) Dynamic scan: driving down the cost of test. IEEE Comput 35(2):63–68 24. Sankaralingam R, Oruganti RR, Touba NA (2000) Static compaction techniques to control scan vector power dissipation, IEEE VLSI Test Symposium, pp 35–40 25. Sankaralingam R, Pouya B, Touba N (2001) Reducing power dissipation during test using scan chain disable, IEEE VLSI Test Symposium, pp 319–324 26. Sinanoglou O, Bayractaroglou I, Orailoglou A (2002) Test power reduction through minimization of scan chain transitions, IEEE VLSI Test Symposium, pp 166–171 27. Tudu J, Larsson E, Singh V, Agrawal V (2009) On minimization of peak power for scan circuit during test, IEEE European Test Symposium, pp 25–30 28. Vranken H, Waayers T, Fleury H, Lelouvier D (2001) Enhanced reduced-pin-count test for full-scan design, IEEE Int. Test Conference, pp 738–747 29. Wang S, Gupta SK (1998) ATPG for heat dissipation minimization during test application. IEEE Trans Comput 47(2):256–262 30. Wang L-T, Stroud C, Touba N (2008) System on chip test architectures, Morgan and Kaufmann Pub 31. Wen X, Miyase K, Suzuki T, Yamato Y, Kajihara S, Wang L-T, Saluja KK (2006) A highly-guided x-filling method for effective lowcapture-power scan test generation,” IEEE International Conference on Computer Design, pp 251–258 32. Whetsel L (2000) Adapting scan architectures for low power operation, IEEE Int. Test Conference, pp 863–872 33. Xiang D, Li K, Fujiwara H, Thulasiraman K, Sun J (2007) Constraining transition propagation for low-power scan testing using a two stage scan architecture. IEEE Trans Circ Syst − II Exp Briefs 54(5):450–454
Efi Arvaniti received the B.S. degree in electronic and computer engineering in 2008 from the Technical University of Crete, Greece and the M.S. degree in computer science in 2011 from the University of Ioannina, Greece. His research interests include VLSI circuit design and design for testability. Yiorgos Tsiatouhas received the B.S. degree in physics in 1990, the M.S. degree in electronic automation in 1993 and the Ph.D. degree in computer science in 1998, all from the University of Athens, Greece. From 1992 to 1996 he was with the National Center of Scientific Research “Demokritos” Athens, Greece. From 1998 to 2002 he was with Integrated Systems Development (ISD) S.A. as cooperative projects director and technical manager of the Advanced Silicon Solutions Div. In 2002 he joined the Department of Computer Science and Engineering, University of Ioannina, Greece, where currently he is an Associate Professor. His research interests are in the area of integrated circuit design and design for testability. Dr. Tsiatouhas is a member of the EDAA, the IEEE and the IEEE Test Technology Technical Council. He was member of the program committees of many conferences in the area of VLSI design and testing. He received the best paper awards of the 2002 I.E. International Symposium on Quality Electronic Design and the 2009 I.E. International Symposium on Design and Diagnostics of Electronic Circuits and Systems.