# Design of Multi-bit Pulsed Latches with Scan Input in CMOS ONK65 Technology

## Vojtech KRAL

Dept. of Radio Electronics, Brno University of Technology, Technicka 12, 616 00 Brno, Czechia ON Design Czech s. r. o., Videnska 204/125, 619 00 Brno, Czechia

## xkralv03@vutbr.cz or vojtech.kral@onsemi.com

Submitted August 18, 2023 / Accepted October 11, 2023 / Online first November 6, 2023

Abstract. This paper presents a new multi-bit pulse latch design that places innovative emphasis on the integration of scan input for automatic test pattern generation (ATPG). Two different designs have been developed in ONK65 technology (65 nm process): the first with standard threshold voltage (SVT) tailored for consumer products and the second with high threshold voltage (HVT) for automotive, each addressing specific aspects of process, voltage, and temperature (PVT). Multi-bit pulse latches offer a more efficient alternative to multi-bit flip-flop circuits and promise significant power and area savings. However, the efficiency of these latches depends on the technology, library type and customer requirements. A multi-bit pulse latch consists of a pulse generator and a pulsed latch. Each component is carefully designed for its specific purpose and the most appropriate topology is selected. Furthermore, the paper serves as a comprehensive guide to the design of low-power digital cells. It rethinks the topology design approach by emphasizing the scan input and presents simulation results for both components of the multi-bit pulse latch, highlighting their advantages. The results show that a less strict PVT offers greater benefits than a strict PVT.

## Keywords

5G chips, area-friendly design, automotive, consumer flip-flops, digital standard cell, dynamic power, leakage, low power chips, multi-bit pulsed latch, pulsed latch, saving area, scan mode, serial shifter, static power

# 1. Introduction

There is an outstanding amount of semiconductor chips being made in all sorts of industries. These chips are used in modern telecommunications such as 5G, information technology, consumer electronics, healthcare, industrial application, and automotive systems [1]. These chips can be developed in different technologies that have different power consumption, but these chips can be also developed with low-power solutions or ultra-low-power solutions that minimize the power consumption of the chip [2]. For example, handheld devices (cell phones) have limited power supplies, because they are powered by batteries. If the power consumption is not optimized, the battery life will be short, or the size of the battery will have to be huge to power this unoptimized device. Both negative cases are unwanted (huge batteries or short battery life). Also, some topologies can be replaced with more area-saving topologies [3].

At the beginning of this paper, there is a short introduction to the digital standard cell library with an explanation of the basics regarding the digital standard cell library. The next section explains where the CMOS chips can lose power and how the power consumption can be optimized. The latest issue related to power consumption is leakage power, as leakage power is becoming dominant in technologies smaller than 65 nm [4]. Today's chips can be designed with low-power or even ultra-low-power methods that can reduce power consumption in the entire chip and the saved power can extend battery life [2]. A multi-bit pulsed latch can be used as one of the low-power techniques [5–7]. Multi-bit pulsed latches are described and compared to multi-bit flop flops in detail in the following sections to give the reader a better understanding of this digital standard cell as a better solution for advanced designs.

Multi-bit pulsed latches with a scan input are the main topic of this paper because they can be used as a better replacement for multi-bit flip-flops commonly used in digital standard cell libraries. This approach allows improvement of power consumption in frame of specific CMOS process significantly, but effectiveness differs with different processes. Multi-bit pulsed latches with a scan input are more complicated (topologically) than standard multi-bit pulsed latches shown in literature [5–7] because the width of the pulse must be in exact range. The pulse can be strongly influenced by PVT (process, voltage, temperature) sets. The designed multi-bit pulsed latch needs to cover all PVTs to provide maximal functionality. Scan input is a challenging feature that can be difficult to design based on these PVT sets. The scan input is a novelty in this multi-bit pulse latch because the behavior of this cell will be more complex and need to be designed precisely. The scan input is used in automatic test pattern generation (ATPG) for digital cell analysis. In automotive designs, scan input is commonly used and in some cases is even necessary for correct function. This type

of cell with scan input has not been published in any article because it is difficult to have a working design under these conditions and it is a specific application for high reliability designs. Multi-bit pulsed latches should have a smaller power consumption and even a smaller area [5–7], but these savings are dependent on the application where the chip will be used (based on PVTs) and on required features (scan input). For this paper multi-bit pulsed latches were designed mainly in an automotive library with their strict PVT sets, but also in a consumer library to see the difference between libraries with different purposes and requirements on final products (acceptable features and conditions are different for automotive vs consumer industry).

## 2. Digital Standard Cell Libraries

Digital standard cells are an irreplaceable part of chips, serving as well-defined cells that can be used in a design as building blocks. These building blocks are also pre-characterized to save time when they are used in a large design [7]. Standard means that cells in a single library have the same height of template (layout), as this makes it easy to connect these cells to the power rails and place them in the matrix. The performance of the libraries is given by the height of the cells. Digital libraries can be often developed as ultra-highdensity (UHD), high-density (HD), or high-speed (HS). High-speed libraries have greater height, larger speed, and bigger power consumption than ultra-high-density and highdensity libraries [7]. The height may also vary depending on the requirements the customer has. Special libraries called power management kit (PMK), or power optimized kit (POK) can be available, too. These libraries contain special low-power or very low-power cells that can be used in designs to optimize power consumption.

Many types of digital standard cell libraries can be found in the technology; it depends on the application or requirements from customers. The width of the cells is given by the complexity of the cell, but the width of the cell should be minimized to achieve a maximal density of the design. These libraries can also be designed with different models of transistors that have different threshold voltages. In libraries, low-voltage threshold devices (LVT), standard threshold devices (SVT,) or high-voltage threshold devices (HVT) [8] can usually be found.

These digital standard cells have different views that are used for different applications; like schematic, layout, symbol, Verilog, liberty, cell-aware, Voltus, and others. The pre-characterized data for large simulations are stored in liberty for exact process, voltage, and temperature (PVTs). Different libraries require different sets of PVTs depending on the application or customer requirements. Digital standard cell libraries are always simulated at cross corners, which means that two worst-case scenarios can occur in a design that needs to be simulated. The first represents that the design is slow, and the second that the design is fast. Then we are talking about a slow corner and a fast corner. The precharacterization in cross-corners provides some knowledge



Fig. 1. Schematic and layout of inverter in 65 nm technology: a) topology, b) cell layout.

| Name of cell                                  | Description of the cells                                                                                      |
|-----------------------------------------------|---------------------------------------------------------------------------------------------------------------|
| BUF, INV, AND, OR,<br>NAND, NOR, XOR,<br>XNOR | Simple logical functions with multi-inputs and different output strengths                                     |
| HALF / FULL ADDER                             | 2-bit half or full adder with different output strengths                                                      |
| MUX / DEMUX                                   | Multiplexer or demultiplexer with<br>different output strengths                                               |
| ECO CELLS                                     | Universal cells that can be used in case of need                                                              |
| AOI / OAI                                     | Multi-input and/or or or/and logical combination                                                              |
| FLIP-FLOPS / SCAN<br>FLIP-FLOPS               | Flip-flops (plain, reset, set) with different<br>output strengths, scan flip-flop can be used<br>as a shifter |
| LATCHES                                       | Flip-flop controlled with level                                                                               |
| FILLER / FILLCAP                              | The cell can connect power rails or can be used as decoupling capacitances                                    |
| CLOCK GATING<br>CELLS / DELAYS                | Used to synchronize the clock signal,<br>delays are used for compensation of STA<br>violations                |

| T٤ | <b>ıb.</b> 1 | ۱. | List | of | common | cells | in | the | digital | standard | cell | library. |  |
|----|--------------|----|------|----|--------|-------|----|-----|---------|----------|------|----------|--|
|    |              |    |      |    |        |       |    |     |         |          |      |          |  |

about the functionality of the cell in different conditions. The main two corners can also be accompanied by complementary PVTs that can be used in different cases. For example, if the library is mainly used in automotive industry, the main corners would be very strict. There can also be some basic PVTs that can be used in medical, commercial, or other industries [9]. In some cases, it is better to have a library tailored to a customer, because the area and power consumption can be slightly different.

The simplest cell is shown in Fig. 1. It represents a simple layout of the inverter which is composed of two transistors. The height of these transistors in Fig. 1 is not the same because the PMOS transistor is slower than the NMOS transistor with the same height. The height ratio of both transistors is given by the technology or demands. These libraries can be composed of many types of common cells. The generally available cells that are commonly found in the digital standard library (core) are shown in Tab. 1. This paper is primarily focused on a digital standard cell as a multi-bit pulsed latch that can be used as a better replacement for multi-bit flip-flops [5–7].

# 3. Power Consumption in CMOS Technology

Power consumption in CMOS technology is a big issue nowadays, therefore, the components of power consumption will be explained. Power consumption can be divided into two main groups: dynamic power and static power. The dynamic power is caused by flipping circuits as switching power and short-circuit power. Static power refers to leakage power. Each power consumption has its own parasitic features that can affect the results of the power consumption. The total power consumption is given by equation (1) [10]:

$$P_{\text{total}} = P_{\text{switching}} + P_{\text{short}} + P_{\text{leakage}}.$$
 (1)

Contributors of the total power consumption are not the same for all technology processes as it is shown in Fig. 2. As the years were passing by, the technology processes have changed but so have the magnitudes of dynamic and static power. As the years go on, dynamic power increases slightly, but the static power increases significantly. The static power becomes dominant when the technology reaches the 65 nm process. This needs to be addressed or at least investigated to gain insights into the leakage power of the following design. The fundamental principle of the power consumption of each component will be explained on the inverter shown in Fig. 1 in the following subsection [10].

#### 3.1 Switching Power

Switching power is caused by charging and discharging of capacitances node where the output is. The capacitance node mainly includes gate, overlapping, and interconnection capacitance [15]. That is the reason why the layout needs to be designed precisely to minimize parasitic elements. The switching power can be expressed as follows (2):

$$P_{\text{switching}} = \alpha \cdot C_{\text{L}} \cdot f \cdot V_{\text{DD}}^2 \tag{2}$$

where  $\alpha$  is the switching activity factor of the clock,  $C_{\rm L}$  is the capacitance load connected to the output stage, f is the frequency of the clock and  $V_{\rm DD}$  is the power supply voltage of the cell [7].



Fig. 2. Dependence of normalized power on years [11].

The equation shows that the switching power depends on several quantities that are easily observable and measurable in CMOS circuits. These parameters need to be tailored to the application. Frequency and voltage do not need to be used with high value if the circuit performance does not require it. The typical switching activity factor of the clock is used as 0.5 which is equal to 50% of the duty cycle. In a special application, the switching activity factor can be changed. The capacitive load needs to be optimized to the minimum possible value to reduce the switching power, which depends on the layout and the output load. The switching power can be optimized with methods that can change the parameters in (2) [7].

## 3.2 Short-circuit Power

In digital CMOS circuits, there are always two complementary networks: p-network (pull-up) and n-network (pulldown). The mentioned inverter can be used as an example. Normally, when the input and output states are stable, only one transistor is turned on and conducts the output either to the power supply voltage node or to the ground node. The other network is turned off and blocks the current from flowing. There is also a transition state which is called short-circuit. The short-circuit exists when switching to another state from a low logical state to a high logical state and vice versa when both transistors are half-opened [17]. The current flows through both types of transistors from the power supply to the ground. The short-circuit power can be expressed as (3):

$$P_{\rm short} = I_{\rm SC} \cdot V_{\rm DD} \cdot f \cdot t_{\rm SC} \tag{3}$$

where the  $I_{SC}$  is the short-circuit current that is caused by short-circuit,  $V_{DD}$  is the power supply voltage, f is the switching frequency and  $t_{SC}$  is the time of this transition state.

Parameters of (3) such as supply voltage and switching frequency need to be optimized because if the circuits do not need to be fast, the applied voltage and frequency can be lower, which helps reducing the power consumption [7].  $I_{SC}$  can be optimized based on the output stage of the cell. There should be multiple variants of one cell in digital standard cell libraries. The difference between them is in the output stage. These cells can be designed with different driving strengths. It is up to the designer which cells he will use. However, it is usually chosen by algorithms to have a cell with the most appropriate driving strength. Also, the algorithm must be set up correctly.

This suitable cell with optimal driving strength can save power. Dynamic power is the sum of switching and short-circuits power. Dynamic power is more significant for technologies larger than 65 nm as is shown in Fig. 2. For smaller technologies than 65 nm, dynamic power is less significant compared to the static power of the whole chip due to the leakage which will be explained in the next subsection [11].

#### **3.3 Leakage Power (Static Power)**

The static power of the whole chip becomes dominant when the process technology reaches 65 nm which is exactly what this paper deals with, and it is the reason why the leakage will be investigated [11]. Even turned-off transistors can consume some amount of power. This power loss is called leakage, which can be caused by a variety of problems. Several causes will be explained in this subsection to give a basic idea of leakage power. As it was stated earlier, the leakage power is dependent on technology.

With lower technology, the leakage is more dominant than the dynamic power as it is shown in Fig. 2. For nanometer devices, leakage current is dominated by subthreshold leakage, thin-oxide tunneling leakage, and reverse-bias pnjunction leakage [7]. There are still other leakage components like drain-induced barrier lowering and gate-induced drain leakage, but these are not important, because they are not significant in comparison with these three dominant leakages. These dominant leakages are shown in Fig. 3 [4]. The leakage power is expressed as a function of many variables (4):

$$P_{\text{leakage}} = f\left(V_{\text{DD}}, V_{\text{th}}, \frac{W}{L}, T\right)$$
(4)

where  $V_{DD}$  is the power supply voltage,  $V_{th}$  is the threshold voltage of the transistor. W/L is the size of the transistor, where W is width, L length and T is the temperature in kelvins. Switching power and short-circuit power have a frequency as a variable in (2) and (3). Dynamic power dissipates in cycles, but the leakage power due to an absence of frequency in function (4) is continuous and that is why the leakage power is called static power [17].

#### 3.3.1 Subthreshold Leakage

Source

The subthreshold leakage current  $I_{SUB}$  is a current which can flow between drain and source if the transistor is in weak inversion. A weak inversion occurs when the gate to source voltage  $V_{GS}$  is smaller than the threshold voltage

I<sub>3</sub> I<sub>4</sub>

Drain

Gate



Fig. 3. Leakage components are shown in layout and schematic (I<sub>1</sub>-Diode reverse bias current, I<sub>2</sub>-Subthreshold current, I<sub>3</sub>-Tunneling into and through gate oxide, I<sub>4</sub>-Injection of hot carriers from substrate to gate oxide, I<sub>5</sub>- Gate-induced drain leakage, I<sub>6</sub> - Punchthrough) [4].

 $V_{\rm th}$  of the transistor. This current is happening because in the area between drain and source is a small amount of minority carrier concentration which can let the current through a drain to source. This leakage current is dependent on parameters of the transistor like the power supply voltage, width, length, process, temperature, and type of the transistor [4]. The reduction of this leakage current can be made with special topologies. For example, a standard inverter (inverter based on 2 transistors) can be replaced with a stacked inverter (inverter based on 4 transistors) [12]. Topology can be a good solution for subthreshold leakage reduction, but at the cost of the area of the cell. The subthreshold conduction can be used as an advantage. For example, it can be used in ultra-low power analog circuits, especially in dynamic random-access memories [7].

#### 3.3.2 Thin Gate-Oxide Tunneling Leakage

The silicon dioxide (insulator) between active and gate contact is thin, circa a few atoms in modern technologies. Due to the thickness of the insulator, there can be two types of leakages [7]. The first current leakage is called thin gate tunneling  $I_{\text{TUNNEL}}$ . The thin gate tunneling current is generated due to carries that are tunneling through the thin insulator. The second one is known as hot carrier injection current I<sub>HC</sub>. The cause of hot carrier injection current is the massive kinetic energy of carries that can overcome the gate potential barrier and go through the thin insulator. This effect is more usual to happen to electrons because their voltage barrier and effective mass are smaller than for holes [4], [16].

#### 3.3.3 Reverse-Bias pn-Junction Leakage

The last explained type of leakage is reverse-bias pnjunction leakage, which is caused by the structure of the CMOS transistor. In the structure of the CMOS transistors, shown in Fig. 3, there are p-n junctions created between the active and the substrate. This pn-junction acts like a wellformed diode [7], [15]. Even if this diode is reverse-biased, the current flowing through is significant, because the reversed-biased diode still conducts a small amount of current  $I_D$ . This reverse-bias diode current can be expressed as (5):

$$I_{\rm D} = I_{\rm S} \left( \frac{\frac{V_{\rm DB}}{V_{\rm T}}}{-1} \right) \tag{5}$$

where  $I_s$  is the reverse saturation current (parameter for the device),  $V_{DB}$  is the voltage between the drain and the body of the transistor and  $V_T$  is the thermal voltage which depends on temperature. For room temperature (T = 300 K) the value of  $V_T$  is 26 mV [7]. Also, the thermal voltage can be expressed as (6):

$$V_{\rm T} = \frac{kT}{q} \tag{6}$$

where k is Boltzmann constant (1.38×10<sup>-23</sup> J/K), T is the absolute temperature in kelvin and q is the electron charge [13].

## 4. New Multi-bit Pulsed Latch

A multi-bit pulsed latch is a new low-power solution that can be used as a replacement for standard multi-bit flipflops. Multi-bit pulsed latches can be enriched with a scan input (serial shifter) which is nothing but a multiplexer on the input. The multi-bit pulsed latch is composed of a pulse generator and pulsed latches shown in Fig. 4 [6]. Multi-bit means that there is more than one latch that can be driven by one pulse generator as it is shown in Fig. 4. The pulsed latch is a simple latch that can be driven by a short pulse instead of a standard clock source. The latch driven by a pulse has the same truth table as the flip-flop which means that the flip-flop can be replaced by a pulsed latch. A pulse that is sufficiently wide is generated in a pulse generator. The pulse generator can be developed as a global or a local as it is shown in Fig. 5. The global pulse generator is developed separately from the latch and is used less frequently because of parasitic elements (longer routing than in the local pulse generator), more complicated characterization and the place and route phase is complicated too. The local pulse generator is a part of multi-bit pulsed latches. In the case of the pulse generator, there are still parasitic elements, but they are limited to minimum, and the generator is designed for the exact purpose to drive a known number of pulsed latches [6].

#### 4.1 Pulse Generator

The aim of this paper is to limit power consumption and decrease the area of the cell in comparison to a multi-bit flip-flop. This means that the chosen topology of the pulse generator should be simple yet power efficient. The pulse can be generated at different events such as rising, falling or both edges of the clock source. This subsection will be about pulse generators that generate a pulse on the rising edge of the clock signal. At the first sight, the most suitable circuit for the pulse generator seems to be a 2-input logical AND gate with an inverter in one of the inputs which makes the delay as it is shown in Fig. 6. This circuit is also known as a glitch generator [5, 6, 18].

The main problem of the pulse generator development is the delay part that must negate the clock source. Delay can be made in many ways, but every topology has different results. The simplest delay can be made with a standard inverter composed of 2 transistors [6]. Delay of the inverter is dependent on the size of the transistors. The size of the gate can be changed with width and length parameters. The minimal width and length are given by technology. The delay of the inverter can be increased with decreasing width, but there is a minimal value of width that cannot be crossed. If the delay is not delayed enough to make the correct pulse, the length of the gate can be increased. A longer length makes the inverter slower, and the delay increases, too. Simulated topologies of delays are shown in Tab. 2.

It is necessary to think ahead because the chosen pulse generator needs to be effectively layouted. Eight-bit, four-bit and two-bit versions of the pulsed latch will be layouted. The 2-bit and 4-bit versions are layouted as double-row standard



Fig. 4. Block diagram of multi-bit pulsed latch in normal mode.



Fig. 5. Design of global and local pulse generator.



Fig. 6. Simple pulse generator made of inverter and 2-input AND.

cells and the generator needs to be divided into two parts and in the 8-bit version will be layouted as a quad-row standard cells where the generator needs to be divided into four parts.

The simulation was performed using a Spectre schematic netlist (without parasitic elements) within the Virtuoso environment, with uniform simulation parameters applied to each type of delay. The clock source was operated at 100 MHz with a duty cycle of 50%. The chosen frequency aligns with the specific requirements of the design intended to utilize these cells (design will not work at higher frequencies). Subsequently, the delays were adjusted (length of transistors) to achieve a pulse width of 200 ps at a load of 20 fF.

| Type of delay                        | Normalized<br>total power | Normalized<br>leakage | Normalized<br>length of<br>transistor |
|--------------------------------------|---------------------------|-----------------------|---------------------------------------|
| Inverter (norm.)                     | 1.00                      | 1.00                  | 1.00                                  |
| Three inverters                      | 1.03                      | 1.28                  | 0.23                                  |
| Five inverters                       | 1.07                      | 1.45                  | 0.15                                  |
| Stacked inverter                     | 0.97                      | 0.98                  | 0.42                                  |
| Double-stacked<br>inverter           | 0.95                      | 0.98                  | 0.24                                  |
| Modified double-<br>stacked inverter | 0.98                      | 0.99                  | 0.38                                  |

**Tab. 2.** Simulated pulse generators with different delay blocks (note that normalized values clearly represent improvements to standard inverter cell designed in the same process).

Based on these requirements, power consumption and area of the generator, a modified double-stacked inverter and stacked inverter were chosen because the circuit has the best trade between these parameters. The single inverter would have a large length of transistors where the width of this part would not be equal to other parts. This would lead to an asymmetrical structure, and it would be impossible to layout it symmetrically. Topologies with three and five inverters have a higher power consumption than in the case of the chosen topology. The leakage is higher, and width of the cell would be also larger. Also, the structure could not be layouted symmetrically to the rows as it is possible in the case of a single inverter. The most suitable topology would be double-stacked inverter, but the problem is that the topology cannot be implemented because there is no space. The height of the cells is given by the customer, so the height of the cell is fixed. This is the reason why this topology cannot be used. A big advantage of these stacked and double stacked inverter topologies is that they can be layouted as one structure with width the same as the length of the transistor. They can be layouted above them because of the topology.

In the 2-bit version a modified double-stacked inverter topology is used because there is lot of space, and the gen-



Fig. 7. Generated pulses from the designed pulse generator (modified double-stacked inverter) in a typical, fast, and slow corner (purple and yellow courses are for a typical corner, orange and green are fast corner and grey and blue are slow corner).



Fig. 8. Pulse generator with a stacked inverter as a delay part.

erator is on the side of the cell. In 4-bit and 8-bit version there must be the modified double-stacked inverter because there is additional routing by poly. The pulse generator is layouted in the middle of the cell. The rules do not allow better topology, so the modified double-stacked inverter is used in 4-bit and 8-bit versions. The rising and falling edge in every circuit are similar because the output stage is the same size. The simulation is the same in all cases in terms of load, frequency, activity, and transistor sizes except for the length of transistors in delay part. Also, the pulse generators were designed to produce pulses with a width of 200 ps to investigate the behavior of these pulse generators in the same environment. The maximal frequency, which can be used, is limited by load (number of connected pulsed latches), power supply voltage, and temperature. There should not be a minimum frequency that can be used because the width of the pulse is given by the delay block in the pulse generator. The generated pulses across the cross corners from the double stacked topology are shown in Fig. 7. The schematic of pulse generator with double stacked topology is shown in Fig. 8.

#### 4.2 Pulsed Latch

This section is focused on single latches with only one output pin Q that can be controlled by pulse and without a multiplexer (scan input). Latches are compared in parameters that are specified by requirements like power consumption, area, number of transistors, minimal usable width of pulse and clock to output propagation delay. Choosing a pulsed latch is a complicated process because more parameters need to be watched at once. Simulated topologies are shown here:

- PTLA (pass transistor latch) [6], [14],
- SSALA (fully static diff. sense amp) [6], [14],
- SSA2LA (modified SSALA) [14],
- CPNLA (pseudo-nmos input buffer) [14],
- PPCLA (transparent latch) [6], [14],
- the new topology (presented in this paper).

The comparison of the pulse latches is shown in Tab. 3. The ONK65 process at onsemi was used to simulate these topologies, aiming to assess their behavior within this particular process. Topologies were simulated with the same inputs; the same simulated time, inputs (data and clock), and frequency. Voltage was set as a nominal value which is 1.2 V. Also, the process was set as a typical corner (typical HTV models). Pulsed latches were simulated without a chosen pulse generator (explained in a previous subsection). An ideal pulse generator was used with parameters to make a pulse with a width of 200 ps and a frequency of 100 MHz. Simulated pulsed latches are shown in Tab. 3. In addition to the data input and clock input, there are also internal nodes named IMP (pulse), IMPN (negated pulse), L0 (local node) and Q (output). In the case of a circuit with a negated output, there is also QN (negated output). If there is also a scan input there will be two more inputs namely SE (scan enable) and SI (scan input), internally there would be SEN (scan enable negated).

| Type of<br>pulsed latch | Norm.<br>total<br>power | Norm.<br>leakage<br>power | No. of<br>devices | Norm.<br>min.<br>width of<br>pulse | Norm.<br>clock to<br>output<br>prop.<br>delay |
|-------------------------|-------------------------|---------------------------|-------------------|------------------------------------|-----------------------------------------------|
| PTLA (norm.)            | 1.00                    | 1.00                      | 10                | 1.00                               | 1.00                                          |
| SSALA                   | 0.61                    | 1.04                      | 11                | 0.62                               | 1.21                                          |
| SSA2LA                  | 0.63                    | 0.97                      | 12                | 0.70                               | 1.25                                          |
| CPNLA                   | 1.08                    | 1.12                      | 13                | 0.39                               | 0.94                                          |
| PPCLA                   | 0.46                    | 0.99                      | 12                | 0.28                               | 0.62                                          |
| New topology            | 0.47                    | 1.04                      | 12                | 0.23                               | 0.80                                          |

Tab. 3. The comparison of latches controlled by pulse.

There are 6 circuits that were investigated but only one can be chosen. The best type of pulsed latches is based on the total power consumption, number of transistors, the minimal width of the pulse that can be used and a clock to output propagation delay. The leakage is neglected because the difference between the topologies is not significant. The PTLA and CPNLA are not suitable because the total power is huge in comparison to others. SSALA and SSA2LA are not suitable, too, because the minimal width of the pulse must be wider than in case of PPCLA and the new topology. It means that the pulse generator needs to create a wider pulse with higher power consumption demands.

The last two circuits (PPCLA and the new topology) are almost the same. The difference between them is in feedback that leads to different results. Results show that the total power consumption of PPCLA is better than in the case of the new topology and even the leakage is better, but the width of the pulse must be bigger and that leads to higher power consumption demands. The higher clock to output propagation delay of the new topology is taken as an advantage because the multi-bit pulsed latch needs to have a scan mode (serial shifter). In scan mode, the width of pulse is limited from the top. The pulse needs to be in the exact range to cover all process corners. The chosen topology of the pulsed latch is the new topology because this circuit can cover stricter PVTs. The whole reason why this pulsed latch was chosen will be explained in detail in the following section because scan mode has a problem in cross corners. The chosen pulsed latch is shown in Fig. 10. Graphs of the input and output waveforms are shown in Fig. 9.



Fig. 9. Transient waveform of the selected pulsed latch.



Fig. 10. Schematic of a new topology of a pulsed latch.

## 4.3 Multi-bit Pulsed Latch

The main issue in creating a multi-bit pulsed latch is not the process of selecting a single pulsed latch, but the process of creating a multi-bit pulsed latch with a scan mode (serial shifter mode) from the selected pulsed latch and its functionality across PVTs. This is because the width of the pulse in scan mode is not limited just from the bottom but also from the top. Two errors may occur in scan mode. In the first one, the pulse width might be small. In that case, the pulsed latch would not work properly. In the second one, the pulse width might be so big that one pulse would apply the data signal to the second pulsed latch. The output state of the second pulsed latch would be also changed with the single pulse.

The width of the pulse can be changed by the process itself, voltage, and temperature. This means that there are two corners that need to be examined. It is the fast corner where the voltage and temperature are the most possible. The voltage is 1.32 V, and the temperature is 200 °C, and also fast models of the HVT transistors are used (5  $\sigma$ ). The slow corner is the opposite. The voltage is 1.08 V, the temperature is -40 °C, and used models of HVT transistors are slow (5  $\sigma$ ). The nominal corner is 1.2 V and 27 °C with typical HVT models without dispersion.

Figure 11 shows a 2-bit pulsed latch with also negated output stage for complicated conditions in simulation. As it was mentioned before, there are two issues that might occur. These issues give us minimal and maximal range of the width of the pulse that can be used for maximal functionality of the circuit. This range is given by technology that cannot be changed; the topology of the latch, and the point of the chain where the signal is taken to the next latch. The new topology is chosen because the minimal width of the pulse is the smallest in Tab. 3 and the clock to output propagation delay is higher than in the case of the standard PPCLA because there are more inverters in the path that are used as an advantage. The point to the next latch is often taken from the internal node which represents the output data of the first latch. In all circuits, there is no internal node that can reach maximal functionality, because the propagation delay between the pulse and path to the next latch stage is short. That means that the circuit can work only in a fast corner or a slow corner, not in both.

Fig. 11. 2-bit pulsed latch with scan mode and negated output.



Fig. 12. Layout of 2-bit plain pulsed latch (new topology) with scan mode and negated output.

There are two solutions to this problem. The first one is an additional delay. An additional delay can be added to the path, especially before the input of the mux in the next latch. This solution with an additional delay block would have bigger power consumption and a bigger size of the area but at the cost of a robust solution that is verified. The second solution is to take the point for the next latch from the output stage where the Q is located. The path will be long enough to preserve the stability of the circuit in all corners, fast and slow. The second solution has better power consumption and area because there was not any inverter added. The disadvantage of the circuit is that the maximal output load will be a bit decreased. However, the maximal output load will not be decreased as significantly in comparison to the first solution. Also, there would be quite a problem in characterization of this cell as a digital standard cell. The first solution is better because there are more advantages than disadvantages and this design will be used in the automotive industry so the design must be robust. It is necessary to choose a robust solution because the cross corners are very strict. The layout of the 2-bit plain version is shown in Fig. 12.

## 4.4 Comparison of Designed Multi-bit Pulsed Latches and Multi-bit Flip-flops

This section compares multi-bit flip-flops and multi-bit pulsed latches designed in high-density and high-speed library compatible with multi-bit flip-flops designed in these libraries. Simulations were made from layout netlists with parasitic elements such as resistances and capacitances. This comparison is based on power consumption and area of the cell. These designed multi-bit pulsed latches are working in cross corners that were mentioned before. The comparison is divided into 6 tables for HVT (high density) library and 3 tables for SVT (high-speed) library. Three types of bit versions were designed: 2-bit, 4-bit, and 8-bit. Each bit version also has two versions: a plain version without any additional inputs and a reset version with reset input active in low. Also, multi-bit pulsed latches with different output strength were designed, but they will not be compared to multi-bit flip-flops.

It is necessary to mention that the pulse generator must be designed to cover a huge dispersion of automotive needs for HVT library, and that leads to a bigger pulse generator. If the pulse generator did not have to have such difficult conditions, the results would be better. Results will be shown for a high-speed library with SVT devices. The first results for HVT library of the 2-bit plain pulsed latch are shown in Tab. 4. The power consumption in both modes is significantly worse than in the 2-bit flip-flop. It is caused by the pulse generator because the pulse generator is significant in comparison to the whole power consumption, and it must be designed to cover a big spectrum of PVTs. Also, there are just two latches for one generator. The area is smaller by about 10.34%. The power consumption is different in the case of normal and scan mode. In the case of 2-bit and 4-bit pulsed latches versions, the power consumption is significantly worse than in case of multi-bit flip-flops. The 2-bit pulsed latch with reset is shown in Tab. 5. The area is even smaller than in the case of plain version, it is about 14.71% smaller than in case of 2-bit flip-flop with reset.

The results of 4-bit pulsed latches are more promising in saving of area because they are even better in comparison to the 4-bit flip-flop. The results of plain 4-bit pulsed latch are shown in Tab. 6. The area is better about 19.30% which is a great success in comparison to the 4-bit flip-flop. The results of the 4-bit pulsed latch with reset are shown in Tab. 7. The area is saved about 20.31%. The amount of the saved area in comparison to a multi-bit flip-flop is caused by the pulse generator which becomes less significant in comparison to the whole layout.



|                                    | Mode   | Pulsed latch | Flip-flop | Difference [%] |
|------------------------------------|--------|--------------|-----------|----------------|
| Normalized<br>power<br>consumption | Normal | 1.32         | 1         | 31.52          |
|                                    | Scan   | 1.58         | 1         | 57.89          |
| Normalized area                    | -      | 0.90         | 1         | -10.34         |

**Tab. 4.** Comparison of the plain 2-bit pulsed latch and the plain2-bit flip-flop.

|                                    | Mode   | Pulsed latch | Flip-flop | Difference [%] |
|------------------------------------|--------|--------------|-----------|----------------|
| Normalized<br>power<br>consumption | Normal | 1.28         | 1         | 27.69          |
|                                    | Scan   | 1.50         | 1         | 49.98          |
| Normalized area                    | -      | 0.85         | 1         | -14.71         |

 Tab. 5. Comparison of the reset 2-bit pulsed latch and the reset

 2-bit flip-flop.

|                 | Mode   | Pulsed latch | Flip-flop | Difference [%] |
|-----------------|--------|--------------|-----------|----------------|
| Normalized      | Normal | 1.32         | 1         | 31.52          |
| consumption     | Scan   | 1.38         | 1         | 37.75          |
| Normalized area | -      | 0.81         | 1         | -19.30         |

Tab. 6. Comparison of the plain 4-bit pulsed latch and the plain4-bit flip-flop.

|                      | Mode   | Pulsed latch | Flip-flop | Difference [%] |
|----------------------|--------|--------------|-----------|----------------|
| Normalized           | Normal | 1.37         | 1         | 36.69          |
| power<br>consumption | Scan   | 1.47         | 1         | 46.89          |
| Normalized area      | -      | 0.80         | 1         | -20.31         |

 Tab. 7. Comparison of the reset 4-bit pulsed latch and the reset

 4-bit flip-flop.

|                                    | Mode   | Pulsed latch | Flip-flop | Difference [%] |
|------------------------------------|--------|--------------|-----------|----------------|
| Normalized<br>power<br>consumption | Normal | 1.09         | 1         | 8.53           |
|                                    | Scan   | 1.01         | 1         | 0.81           |
| Normalized area                    | -      | 0.82         | 1         | -17.86         |

 Tab. 8. Comparison of the plain 8-bit pulsed latch and the plain 8-bit flip-flop.

|                      | Mode   | Pulsed latch | Flip-flop | Difference [%] |
|----------------------|--------|--------------|-----------|----------------|
| Normalized           | Normal | 1.06         | 1         | 6.40           |
| power<br>consumption | Scan   | 0.93         | 1         | -7.39          |
| Normalized<br>area   | -      | 0.81         | 1         | -19.05         |

 Tab. 9. Comparison of the reset 8-bit pulsed latch and the reset 8-bit flip-flop.

With increasing bits of multi-bit pulsed latch, the more the power consumption and area should be saved because the pulse generator is becoming less significant in comparison to the whole area and power consumption of pulsed latches. The most significant differences are shown in Tab. 8 and Tab. 9 which compare an 8-bit pulsed latch (reset and plain) with an 8-bit flip-flop. The power consumption in the case of a scan mode with a plain version is almost the same as in the case of a plain multi-bit flip-flop. The difference is about 8.53% in normal mode and 0.81% in scan mode. The area is saved about 17.86%. The reset version is even better. The power in normal mode is about 6.40% worse and in scan mode the power consumption is saved about 7.39% and the area is saved about 19.05%. The saved area is smaller than in the case of 4-bit versions because the generator must have been strengthened.

These results are shown for automotive strict PVTs (HVT library) which leads to more complex solution with higher power consumption but at the cost of smaller area. The cells can be still used as a better solution and replacement for multi-bit flip-flops because these cells can save area. It is necessary to show that the PVTs are very strict, and that is the reason why this circuit can be used more often in different industries. The less strict PVTs can be found in consumer industry (chips for mobile phone). The difference between PVTs shown before (automotive industry) and in the consumer industry is in process dispersion and temperature. The simulated cross corner was simulated with  $3\sigma$  models with -40 °C and 175 °C. The voltage was the same as before and for slow corner it was 1.08 V and for fast corner it was 1.32 V. Table 10 shows a 2-bit plain pulsed latch where it is confirmed that a pulse generator has a significant power consumption. The normalized power consumption of pulsed latch is higher than in case of flip-flop, but the area is saved by circa 7.69%.

Table 11 shows a 4-bit pulsed latch that is even better in power consumption than in the case of 4-bit flip-flop. The power consumption is saved by 4.24% in normal mode and 7.43% in scan mode. The area is even smaller than in the case of the 2-bit version because the pulse generator is becoming insignificant in comparison to the whole design. The saved area is circa 28.07%. The last Table 12 shows an 8-bit pulsed latch in comparison to an 8-bit flip-flop. The results are even better than in the comparison done before. Power consumption is saved by 13.14% in normal mode and 15.31% in scan mode. The area is saved by circa 33.93%.

|                      | Mode   | Pulsed latch | Flip-flop | Difference [%] |
|----------------------|--------|--------------|-----------|----------------|
| Normalized           | Normal | 1.12         | 1         | 12.22          |
| power<br>consumption | Scan   | 1.11         | 1         | 10.98          |
| Normalized area      | -      | 0.92         | 1         | -7.69          |

 Tab. 10. Comparison of the plain 2-bit pulsed latch and the plain

 2-bit flip-flop in less strict library (HS – SVT).

|                      | Mode   | Pulsed latch | Flip-flop | Difference [%] |
|----------------------|--------|--------------|-----------|----------------|
| Normalized           | Normal | 0.96         | 1         | -4.24          |
| power<br>consumption | Scan   | 0.93         | 1         | -7.43          |
| Normalized<br>area   | -      | 0.72         | 1         | -28.07         |

 Tab. 11. Comparison of the plain 4-bit pulsed latch and the plain

 4-bit flip-flop in less strict library (HS – SVT).

|                                    | Mode   | Pulsed latch | Flip-flop | Difference [%] |
|------------------------------------|--------|--------------|-----------|----------------|
| Normalized<br>power<br>consumption | Normal | 0.87         | 1         | -13.14         |
|                                    | Scan   | 0.85         | 1         | -15.31         |
| Normalized<br>area                 | -      | 0.66         | 1         | -33.93         |

**Tab. 12.** Comparison of the plain 8-bit pulsed latch and the plain8-bit flip-flop in less strict library (HS – SVT).

# 5. Conclusion

Special digital standard cells are commonly used and must be designed with precision. A set of multi-bit pulsed latches was investigated and designed for automotive (library with strict PVTs) and consumer (library with less strict PVTs) purposes that can be used as a better replacement for multi-bit flip-flops that are often used in semiconductor chip designs. The results in the automotive industry are promising only in the saved area compared to multi-bit flip-flops because the pulse generator in multi-bit pulsed latches must be strengthened so that the advantage of saving power consumption is lost. The automotive models were simulated with  $5\sigma$  dispersion which is required in automotive. It is confirmed that multi-bit pulsed latches can be used as a better replacement for multi-bit flip-flops if the area needs to be saved but at the cost of a higher power consumption.

Multi-bit pulsed latches were also designed in a different high-speed (HS) library with  $3\sigma$  dispersion of the devices and standard voltage threshold (SVT) for consumer industry. The results are more interesting than in the case of automotive industry because the pulse generator can be designed with smaller delay because the dispersion of the devices is not so big. The power consumption in a 2-bit version of multi-bit pulsed latch is worse than in case of multi-bit flipflop because the power consumption of the pulse generator is significant in comparison to the whole design. Other versions are better in both parameters; power consumption and area.

Both designed versions for automotive industry and consumer industry can be used as a better replacement for multi-bit flip-flops. Automotive designs can be used only for area saving purposes, but consumer designs can be used as area saving and power consumption saving cells. It is confirmed in this paper that multi-bit pulsed latches can be used as a better replacement, but it is not recommended to use these cells in designs where strict PVT and high reliability of the designs are needed. It is because with high demands on a device dispersion, the cells will have less advantages in comparison to a standard multi-bit flip-flop. With their impressive performance and cost-efficiency, the digital standard cells detailed in this paper are well-suited for use in a wide range of consumer applications rather than in automotive.

In future work, it would be valuable to conduct a detailed analysis of these two designs, with a particular focus on electromigration, voltage drop, reliability, and verification on silicon. Electromigration analysis is essential due to the pulse generator operating at spikes (current spikes), which could potentially lead to short-circuits or voids. Reliability assessment is crucial, especially if the design is intended for use in an industry where safety is a priority (automotive). Additionally, silicon verification is necessary as the behavior of these structures on actual silicon may differ from what is observed in the simulation model, even with parasitic extraction.

## Acknowledgments

This work was supported by the Internal Grant Agency of Brno University of Technology, project no. FEKT-S-23-8191. I would like to thank my colleagues at work for their unlimited wisdom they have provided me during this analysis.

## References

- WU, X., ZHANG, C., DU, W. An analysis on the crisis of "Chips shortage" in automobile industry — based on the double influence of COVID-19 and trade friction. *Journal of Physics: Conference Series*, 2021, vol. 1971, p. 1–5. DOI: 10.1088/1742-6596/1971/1/012100
- [2] RABAEY, J. Low Power Design Essentials. 1st ed. Springer US, 2009. ISBN: 9780387717135
- [3] HOROWITZ, M., ALON, E., PATIL, D., et al. Scaling, power, and the future of CMOS. In *IEEE International Electron Devices Meeting (IEDM Technical Digest)*. Washington (DC, USA), 2005, p. 7–15. DOI: 10.1109/IEDM.2005.1609253
- [4] ROY, K., MUKOPADHYAY, S., MAHMOODI-MEIMAND, H. Leakage current mechanisms and leakage reduction techniques in deep-submicrometer CMOS circuits. *Proceedings of the IEEE*, 2000, vol. 88, no. 4, p. 305–327. DOI: 10.1109/JPROC.2002.808156
- [5] SINGH, K., ROSAS, O. A. R., JIAO, H., et al. Multi-bit pulsedlatch based low power synchronous circuit design. In *IEEE International Symposium on Circuits and Systems (ISCAS)*. Florence (Italy), 2018, p. 1–5. DOI: 10.1109/ISCAS.2018.8351251
- [6] ROSAS, O. A. R. Multi-bit Pulse-based Latches for Low Power Design. Master's Thesis. Eindhoven University of Technology, 2017.
- [7] WESTE, N. H. E., HARRIS, D. M. CMOS VLSI Design: A Circuits and Systems Perspective. 4th ed. Boston: Addison Wesley, 2010. ISBN: 0-321-54774-8
- [8] RABAEY, J. M., CHANDRAKASAN, A., NIKOLIĆ, B. Digital Integrated Circuits: A Design Perspective. 2nd ed. Upper Saddle River: Pearson, 2003. ISBN: 0130909963
- [9] KAESLIN, H. Digital Integrated Circuit Design: From VLSI Architectures to CMOS Fabrication. Cambridge: Cambridge University Press, 2008. ISBN: 978-0-521-88267-5
- [10] KANG, S. M., LEBLEBICI, Y. CMOS Digital Integrated Circuits: Analysis and Design. 3rd ed. McGraw-Hill, 2003. ISBN: 9780072460537
- [11] NILSSON, P. Arithmetic reduction of the static power consumption in nanoscale CMOS. In 13th IEEE International Conference on Electronics, Circuits and Systems. Nice (France), 2006, p. 656–659. DOI: 10.1109/ICECS.2006.379874
- [12] KAO, J., NARENDRA, S., CHANDRAKASAN, A. Subthreshold leakage modeling and reduction techniques [IC CAD tools]. In *IEEE/ACM International Conference on Computer Aided Design* (ICCAD 2002). San Jose (CA, USA), 2002, p. 141–148. DOI: 10.1109/ICCAD.2002.1167526
- [13] RAZAVI, B. Fundamentals of Microelectronics. 2nd ed. Wiley, 2013. ISBN: 978-1118156322
- [14] HEO, S., KRASHINSKY, R., ASANOVIC, K. Activity-sensitive flip-flop and latch selection for reduced energy. *IEEE Transactions* on Very Large Scale Integration (VLSI) Systems, 2007, vol. 15, no. 9, p. 1060–1064. DOI: 10.1109/TVLSI.2007.902211

- [15] JACOB BAKER, R. CMOS Circuit Design, Layout, and Simulation. 3rd ed. Hoboken: Wiley-IEEE Press, 2010. ISBN: 978-0-470-88132-3
- [16] KHANNA, V. K. Integrated Nanoelectronics. Nanoscale CMOS, Post-CMOS and Allied Nanotechnologies. New Delhi (India): Springer, 2016. ISBN: 978-81-322-3623-8
- [17] ROY, K., PRASAD, S. Low-Power CMOS VLSI Circuit Design. New York: John Wiley, 2000. ISBN: 0-471-11488-X
- [18] ELSHARKASY, W. M. Low Power Reliable Design using Pulsed Latch Circuits. Dissertation. University of California, Irvine, 2017.
   [Online] Cited 2023-07-18. Available at: https://escholarship.org/uc/item/5ss2z430

# About the Author ...

**Vojtech KRAL** was born in Nachod, Czech Republic on September 9th, 1996. He received his bachelor's degree in Electronics and Communication from the Brno University of Technology in 2019 and his master's degree in 2022. He is currently enrolled in a Ph.D. program. Since 2018 he has been working for onsemi where he is mainly focused on digital standard cell libraries. His research interests include ultra-low power digital standard cells in automotive, mobile communication libraries, and automatic layout tool which can make layout of digital standard cells from schematic netlists.