The current technology trend focuses on lower nodes of the transistor, which makes accommodating a greater number of transistors of the same size quite easy. Scaling down the technology node has its benefits and flaws. We reduce the supply voltage to protect the cells from an enormous electric field across the gate oxide and the conducting channel. Reduction of supply voltage saves dynamic power dissipation, but it slows down the CMOS transistor. By reducing the voltage, the static power dissipation becomes equal to or more than that of dynamic power dissipation. At lower nodes, leakage power can consume more than 50% of the overall chip power, hence, the high-performance chips have enormous power dissipation, even in the standby mode.
While in the signoff phase, we might look at how fast we can close the design concerning the timing. We generally avoid power optimization while fixing the timing with our standard VT swap and drive strength change techniques. We tried a different approach to optimize the timing that can improve the leakage power in the lower technology node designs.
- Static Power (leakage power)
- Dynamic Power
Static power is the power consumed when there is no circuit activity, or we can say when the circuit is in quiescent mode. When a supply voltage is present, even if we withdraw the clocks, and don’t change the circuit inputs, it will continue to consume power, and that is called static power consumption.
It is mainly due to the leakage currents that flow when the transistor is in an off state. There are many types of leakage currents. In the diagram below, we have shown only two common leakage currents.
In the nanometre design, leakage power is a big concern. Leakage power dissipation occurs mainly due to sub-threshold current. It increases exponentially with a reduced threshold voltage.
Delay Td = (CL * Vdd ) / (Vdd – Vt )a:
Td – propagation delay
CL – load capacitance
Vdd – supply voltage
Vt – threshold voltage
a – coefficient
Figure 1 Leakage currents in a PMOS transistor
Dynamic power is the power consumed when the circuit is in operation. It means that we apply supply voltage, clocks, and change the inputs. It occurs mainly due to the dynamic currents, such as capacitance currents (switching power) and short-circuit currents (short-circuit power).
We collected data from the library and compared it as shown in the chart below. We compared this data to see how timing signoff iterations are done without getting too much hit on the leakage power.
The tables below are of the victim cell contributing to the path delay. Changing this cell’s down model to a suitable one can close the path. However, to choose a preferable down model, we must check its cell delay and leakage power. The highlighted portion with the red box shows a better timing of D2ULVT with lesser leakage as compared to D8LVT.
Cell Type
|
Cell
delay
(ps)
|
Slack (ps)
|
Leakage
(mW)
|
AN2D2SVT
|
0.040564
|
-0.00976
|
7.32E-06
|
AN2D4SVT
|
0.033764
|
-0.00179
|
1.07E-05
|
AN2D6SVT
|
0.037529
|
-0.00529
|
1.82E-05
|
AN2D8SVT
|
0.037254
|
-0.00571
|
2.64E-05
|
AN2D2LVT
|
0.028544
|
0.00374
|
3.28E-05
|
AN2D4LVT
|
0.024059
|
0.008856
|
4.90E-05
|
AN2D6LVT
|
0.026528
|
0.006611
|
8.56E-05
|
AN2D8LVT
|
0.025945
|
0.00671
|
1.24E-04
|
AN2D2ULVT
|
0.02269
|
0.010199
|
1.20E-04
|
AN2D4ULVT
|
0.019279
|
0.014089
|
1.79E-04
|
AN2D6ULVT
|
0.021125
|
0.012461
|
3.12E-04
|
AN2D8ULVT
|
0.020622
|
0.01236
|
4.50E-04
|
Cell Type
|
Cell
delay
(ps)
|
Slack (ps)
|
Leakage (mW)
|
AN2D2SVT
|
0.044853
|
-0.01703
|
7.32E-06
|
AN2D4SVT
|
0.040014
|
-0.01239
|
1.07E-05
|
AN2D6SVT
|
0.043736
|
-0.01586
|
1.82E-05
|
AN2D8SVT
|
0.044748
|
-0.01945
|
2.64E-05
|
AN2D2LVT
|
0.031974
|
-0.0028
|
3.28E-05
|
AN2D4LVT
|
0.028685
|
-0.00043
|
4.90E-05
|
AN2D6LVT
|
0.031716
|
-0.00322
|
8.56E-05
|
AN2D8LVT
|
0.032456
|
-0.00633
|
1.24E-04
|
AN2D2ULVT
|
0.02527
|
0.004412
|
1.20E-04
|
AN2D4ULVT
|
0.022764
|
0.005812
|
1.79E-04
|
AN2D6ULVT
|
0.025085
|
0.003701
|
3.12E-04
|
AN2D8ULVT
|
0.025664
|
0.000603
|
4.50E-04
|
Table 1: Data based on a weak driver Table 2: Data based on a strong driver
There can be two possibilities - the driver of this cell can be weak or strong based on the two tables and graphs that represent the behaviour of the cell in terms of its delay and leakage power.
Figure 2: Strong driver (D4SVT)
Figure 3: Weak Driver (D2SVT)
- We can state that in both ways, switching directly to ULVT/LVT steals the power from us. We can choose the correct drive strength with the suitable VT and still close the timing. Also, note that D2ULVT gives more benefit as compared to D8LVT.
Let’s take another experiment with a different cell, maximum length, and minimum length of the net. In both the graphs, the drivers are strong with respect to the length. This further clarifies the statement of choosing the right down model.
This case is free from Crosstalk, and in the Minimum Net length case, we just have to convert the cell from D2SVT to D4SVT and that closes the timing. In the Maximum Net length case, the transition is the issue, where we can break the net into a number of parts, and then we have to choose the correct down model.
- When cells are sitting in the vicinity of other cells, nets are not too long. Let’s see how the delay of that path and leakage go together.
Figure 4: Behavior of slack when the net length is minimum
- We often deal with longer nets and transition issues. Let’s see how our analysis goes with longer nets. Note that the lower drive strength of cells gives a bad result in the longer nets case.
Figure 5: Behavior of cell when the net length is maximum
We also tried to see what happens when the net is marginally long (around 100 um), and victim of the crosstalk of 20 to 35 ps. Above all, the experiments were free from the crosstalk effect. Let’s mix crosstalk in the recipe now.
Crosstalk can be a key factor while fixing timing and we cannot overlook it. Let’s see the crosstalk-dominated timing path as below that boosted the driver.
Figure 6: Behavior of crosstalk-dominated net when the driver is strong
- Let’s see what happens when we have a medium-range driver for a certain amount of crosstalk. The data depicts a strong driver case, cell delay, decrease in higher drive strength cells, and a good amount of hold over the slack.
Figure 7: Behavior of crosstalk-dominated net when the driver is medium
- One of the reasons for crosstalk is long net in a congested area and that too driven by a weaker driver. This scenario has shown some reverse cell delay to slack characteristic, where going from D24SVT to D2LVT timing is degraded. It also stands true while moving from D24LVT to D2ULVT.
Figure 8: Behavior of crosstalk-dominated net when the driver is weak
While converting the SVT cells to LVT/ULVT cells for timing optimization, we can keep the below data in our mind and use it to make it more optimized in terms of timing as well as power.
For a functional cell (AND cell) case mentioned in the first table, choosing the D2ULVT over D8LVT gives us tremendous timing optimization while surprisingly costing less leakage power that we generally miss while fixing the timing.
Cell name
|
Slack (ps)
|
Leakage (mW)
|
AN2D8LVT
|
-0.00633
|
1.24E-04
|
AN2D2ULVT
|
0.004412
|
1.20E-04
|
Table 3 D8LVT and D2ULVT
Let's look at some practical scenarios,
- Based on the AND cell table, if we have five D2SVT cells and the path is violated with -20ps, we can choose to convert D2SVT to D4SVT and gain five to six ps and get a lesser impact on the leakage power instead of choosing two D4LVT and have more impact on the leakage.
Cell Name
|
Slack
|
Leakage power (mW)
|
Scenario
|
Total leakage
|
Path slack
|
AN2D2SVT
|
-0.01703
|
7.32E-06
|
5 cells of D2SVT
|
3.66E-05
|
-20ps
|
AN2D4SVT
|
-0.01239
|
1.07E-05
|
5 cells of D4SVT
|
5.37E-05
|
+3ps
|
AN2D4LVT
|
-0.00043
|
4.90E-05
|
3 cells D2SVT + 2 cells D4LVT
|
1.20E-04
|
+2ps
|
Table 4: Real-time scenario
We usually have a case as mentioned below, where mixed types of cells are present in one path. We can observe from the table that to get -25ps to closure, we can convert D2SVT to D4SVT based on the earlier study, but we can still optimize it more if we choose two D4LVT in place of D16SVT and get a lesser impact on the leakage. It can help close the timing.
Table 5 shows the total leakage of D2SVT + D16SVT, D4SVT + D16SVT, and D2SVT + D16SVT + D4LVT.
Practical Scenario (all cells are inverters below)
|
Total leakage
|
Path Slack
|
3 D2SVT (3.93E-06*3 ) + 3 D16SVT (3.35E-05* 3)
|
1.12E-04
|
- 15 ps
|
3 D4SVT (7.22E-06*3 ) + 3 D16SVT (3.35E-05 * 3)
|
1.22E-04
|
-2 ps
|
3 D2SVT ( 3.93E-06*3 ) + 1 D16SVT (3.35E-05* 3) + 2 D4LVT (3.09E-05* 2)
|
1.07E-04
|
+ 8 ps
|
Table 5: Practical Scenario
Table 6 shows the summary of how D4LVT has less delay and less leakage power as compared to some SVT cells that we can use while optimizing the timing.
- We have converted cells to their different down model and the table below shows its impact on slack. For example, if we have D18SVT and we convert it to D4LVT, we will see a positive slack in the path and less leakage.
Cell Name
|
Path slack
|
leakage
|
slack diff
|
leakage diff
|
INVD4LVT
|
0.006295
|
3.09E-05
|
|
|
INVD16SVT
|
-0.009806
|
3.35E-05
|
▲0.016101
|
▼-2.63E-06
|
INVD18SVT
|
-0.012951
|
3.77E-05
|
▲0.019246
|
▼-6.82E-06
|
INVD20SVT
|
-0.014544
|
4.07E-05
|
▲0.020839
|
▼-9.87E-06
|
INVD22SVT
|
-0.014544
|
4.80E-05
|
▲0.020839
|
▼-1.72E-05
|
Table 6: Summary of D4LVT vs High strength SVTs
- The case below is of when the driver is weak and is not able to drive bigger cells. In such a scenario, choosing D2ULVT is preferable (all cells are inverters). Despite being a ULVT cell, it has lesser leakage than LVT cells that is surprising, and it can be a key factor to save power without affecting the timing.
We all have some leakage optimization flows, where it converts from ULVT to LVT based on its timing, but the table below shows that even if we have good timing with D8 or D12 LVT, we can convert it to D2ULVT cell and have less leakage.
Cell
|
path slack
|
Leakage
|
slack diff
|
leakage diff
|
INVD2ULVT
|
0.0051
|
6.18E-05
|
|
|
INVD8LVT
|
0.003844
|
6.21E-05
|
▲0.0013
|
▼3.20E-07
|
INVD10LVT
|
0.002791
|
7.84E-05
|
▲0.0023
|
▼1.66E-05
|
INVD12LVT
|
0.001499
|
9.49E-05
|
▲0.0036
|
▼3.31E-05
|
INVD14LVT
|
-0.000925
|
1.12E-04
|
▲0.0060
|
▼4.97E-05
|
INVD16LVT
|
-0.003176
|
1.51E-04
|
▲0.0083
|
▼8.89E-05
|
INVD18LVT
|
-0.005885
|
1.66E-04
|
▲0.0110
|
▼1.04E-04
|
INVD20LVT
|
-0.007082
|
1.83E-04
|
▲0.0122
|
▼1.22E-04
|
INVD22LVT
|
-0.007082
|
2.17E-04
|
▲0.0122
|
▼1.55E-04
|
INVD24LVT
|
-0.011441
|
2.50E-04
|
▲0.0165
|
▼1.88E-04
|
Table 7: Summary of D2LVT vs high strength LVTs
Limitation:
- This study and observation have one limitation - when the Net is too long, it causes high net delay. When it is the victim of crosstalk, it has a high delay also. There, we cannot choose D2ULVT or D4LVT cells over high driving strength cells.
- As mentioned, in crosstalk-dominated case when the driver is weak and net length is too high, smaller cells like D2ULVT tend to perform reverse characteristics on the timing scale.
- The data below shows that D2ULVT makes the path go more -ve than D24LVT. We must look for these certain cases while applying the above strategy.
Cell Name
|
Long nets/crosstalk path slack
|
Normal path slack
|
leakage
|
BUFD24SVT
|
-0.24288
|
-0.040899
|
6.64E-05
|
BUFD2LVT
|
-0.364352
|
-0.007607
|
3.09E-05
|
|
|
|
|
BUFD24LVT
|
-0.217539
|
-0.00642
|
3.00E-04
|
BUFD2ULVT
|
-0.320156
|
0.016439
|
1.13E-04
|
Table 8: Limitation
- After looking at the above-depicted graphs, we may be able to state the relationship between cell delay and leakage. It is another way of driving strength and leakage power.
- We do not suggest that we should always choose certain drive strength or VT over another VT. It is purely dependent on what the problem is that causes timing failure.
- When the drive strength is weak, we should boost the driver first and when the net length is high, we should break the net. We generally use D8 or D16 buffer to break the net or an inverter pair equivalent to that. But now we can have some help from this study to choose the appropriate drive strength for buffer or inverter.
- Not only while fixing the timing, but we can also propose this method in the earlier stages, where we have margins in the timing path, and we can convert cells based on the suitable case.
Based on the experiments above, we can state that choosing the right VT and drive strength surely benefits us in terms of leakage power. We always consider ULVT as a power-hungry VT type but with this paper, we saw that it is not always true. We know that we must take care of all the possible cases like crosstalk-dominated and high-net length with respect to their problems.
From the above-mentioned power optimization technique, we met the timing while saving maximum leakage power. It is to find the optimal trade-off point between leakage power and delay of a cell, keeping in mind the lower-technology node’s design. The results were very convincing that if we fix the timing by keeping these things in mind, we can save a lot of power. To know more about our services please checkout our services page here.
About the Authors
Prerak Dalia works as an Engineer at eInfochips. He holds a Bachelor of Engineering degree in Electronics and Communication from CSPIT, Changa, India. With over 3.6 years of experience in physical designing, he has developed expertise in PnR, physical verification, static timing analysis.
Jaimini Prajapati woks as a Senior Engineer (Level 2) at eInfochips. She holds a Master of Engineering degree in VLSI from LCIT, Bhandu, India and possesses over 6.5 years of experience in lower technology node for complex networking SoCs. she has developed expertise in PnR, static timing analysis and physical verification.
Abhishek Chhajer is a Senior Technical Lead at eInfochips. He holds a Bachelor of Engineering degree in Information Communication Technology ( ICT) from DAIICT Engineering College, India. With over 9.5 years of experience in physical designing, he has developed expertise in PnR, physical verification, static timing analysis and fullchip timing and fullchip physical verification.
Kapil Saxena is an ASIC Delivery Manager (Level 2) at eInfochips. He holds a Bachelor of Engineering degree in Electronics and Communication from IT BHU, India. With over 25+ years of experience in physical designing SoCs, he is delivering complex SOC products and managing large teams.