Challenges of CAD Development for Datapath Design


Previous Next     Page 4 of 12

III. Datapath Logic Synthesis

Logic synthesis, which transforms a design from RTL to circuit level, has been widely studied for control logic. Logic synthesis [5, 6] involves two steps: logic minimization followed by technology mapping to a user-specified library. Datapath circuits possess a very high degree of regularity that has to be preserved throughout the design process to achieve high density and performance. If the traditional logic synthesis approach based on logic minimization is used, then some regularity would be lost, resulting in inferior results. Therefore, an ideal datapath synthesis approach should first extract the regularity inherent in RTL descriptions prior to mapping the circuit to a desired technology. The extracted regularity results in a design hierarchy, which should be preserved to achieve high design quality as well as productivity.

We propose a novel methodology for logic synthesis of datapath circuits, where the datapath regularity is first extracted and then the circuit is mapped to a desired technology while preserving regularity. The input to our synthesis approach is an RTL description of a datapath circuit. Regularity in the circuit implies the existence of subcircuits, called templates, which have multiple instances in the circuit. Regularity extraction first identifies a sufficiently large set of templates and their instances, and then completely covers the circuit by a subset of these template instances. The template instances are then grouped to form datapath vectors. A schematic of the datapath is generated using these vectors and the boundary constraints on the I/O buses and signals. The schematic helps the designer in understanding the circuit and in making important decisions about or changes to the templates and vectors identified so far. The next step is to map the templates to static and dynamic logic as desired, thus resulting in efficient multi-technology designs. Finally, the mapped templates are sized according to the loading on the primary outputs of the circuit.

Figure 3: HDL description of a small datapath circuit used to illustrate our synthesis approach

Figure 3: HDL description of a small datapath circuit used to illustrate our synthesis approach

We describe below in detail the various steps in our synthesis methodology of datapath circuits. We explain our methodology with the aid of the circuit in Figure 3 (the corresponding logic diagram is shown in Figure 4).

3.1. Regularity Extraction Techniques

The task of regularity extraction is to identify a set of templates and their instances from the RTL description of the circuit (template generation step), and then to cover the given circuit by a subset of these templates (circuit covering step), where the objective is to use large templates that have a large number of instances. Figure 5 illustrates a circuit cover with four templates, where template T1 has six instances, T2 has three instances, and so on. The extraction step involves a tradeoff, since a large template usually has a few instances, while a small template has a high number of instances. Note that the template composed of T2 followed by T1 has only three instances, compared to six instances of T1. Usually, a large template implies a better optimization of area and performance, while a template with more instances requires less design effort, assuming a template is synthesized only once for all its instances.

Figure 4: Logic diagram of the circuit of Figure 3; the four templates shown here form a circuit cover

Figure 4: Logic diagram of the circuit of Figure 3; the four templates shown here form a circuit cover

Several techniques for extraction of functional regularity have been proposed in the literature [3, 4, 8, 9, 10, 11, 12]. Most of these techniques focus on covering a circuit by templates, assuming that a library of templates is provided by the user. Very few techniques address the problem of generating a good set of templates. Given a library of templates, Corazao et al. [8, 11] address the problem of mapping a circuit described at a behavioral level using templates from the target library. Their approach addresses several key subproblems, such as finding complete as well as partial matches of a template and selecting a good set of templates to optimize the clock period. Rao and Kurdahi [12] represent the input circuit as well as templates from the given library by strings, and they use a string matching algorithm to find all instances of the template in the circuit. These authors present heuristics to generate a set of templates; the final cover is highly sensitive to these templates. Odawara et al. [9] present a methodology to identify structural regularity in highly regular datapaths. In their method, latches driven by the same control signals as initial templates are chosen and used to grow larger templates. Odawara's approach identifies one-dimensional regularity in terms of bit-slices of the datapath. Other approaches by Nijssen et al. [10] and Arikati et al. [3] extend Odawara's methodology to identify bit slices as well as stages of datapath circuits. These structural methods perform well for highly regular circuits, but might not work for circuits with a mix of datapath and control logic. A problem similar to regularity extraction is technology mapping, where the input circuit is covered by cells (templates) from a given library. Keutzer [7] proposed partitioning the circuit into rooted trees and then mapping the trees using library cells, by using dynamic programming. All the above-mentioned techniques address the problem of covering a circuit by templates, where the templates are either provided by the user or generated in an ad hoc manner. None of these techniques deal with the systematic generation of a set of templates for a given circuit.

We have designed an efficient and robust approach for extraction of functional regularity [13, 14], where the set of all possible templates is generated automatically for the input circuit under two simplifying, yet practical assumptions: (a) only maximal templates are considered, where a template is maximal if and only if all its instances are not entirely covered by instances of another template, and (b) input permutations of gates in the RTL description are ignored. The number of templates is reduced to within V2, where V is the number of components in the circuit. We have demonstrated that a wide range of efficient covers are obtained for various benchmarks from the set of templates generated by our approach [13]. Since a sufficiently large set of templates is generated, and the binate covering problem is inherently difficult [5], we employ simple and efficient heuristics to cover the circuit. Our approach recursively selects a template from the complete set of templates, based on one of the following heuristics, and deletes all its non-overlapping instances from the input circuit, until the entire circuit is covered.
(a)    Largest-fit-first (LFF) heuristic: select the template with the maximum area, where the area of every component is given.
(b)    Most-frequent-fit-first (MFF) heuristic: select the template with the maximum number of instances.

These two heuristics give different covers; other heuristics can be used to generate a range of covers from which the designer can choose the most desirable cover. In fact, we can represent the set of covers by a template hierarchy, where regularity among different templates is recursively extracted [14]. In the event that a template is specified by the designer, using our approach, all its instances can be generated and used for finding a cover. Thus, a cover, which is a mix of automatically extracted and user-specified templates, can be generated. (We have filed a patent on our regularity extraction approach [15]).

3.2. Vector Identification

So far, we have generated templates using functional regularity [13] without accounting for the circuit structure in terms of the interconnections among the template instances. As a result, the templates do not directly correlate with the datapath vectors. For example, the six instances of template T1 in Figure 4 should belong to two different vectors. (Here, a vector is defined as a set of template instances that are grouped together for subsequent synthesis and layout stages.) We now consider structural regularity to transform templates into datapath vectors [13]. We explain the steps of vector identification using the example of Figure 4; the resulting vectors are shown in Figure 5.

  • Simple vectors: The instances of a template are partitioned into vectors, which we call simple vectors. For example, the template T1 of Figure 4 is partitioned into two simple vectors, SV1 with two instances and SV2 with four instances. The remaining templates result in a single simple vector each.
  • Composite vectors: Simple vectors of different templates are grouped, if possible, to form composite vectors. For example, simple vector SV1 of template T1 is grouped with the simple vector of template T3 to form a composite vector V1 (see Figure 5).

The resulting vectors of the template cover of Figure 4 are shown in Figure 5. We use a set of efficient heuristics to group template instances to form simple or composite vectors. These heuristics are listed below.

  1. Control/data inputs: The input signals of template instances are classified as control or data from the HDL description of Figure 3, e.g., sel1 is a control signal, while a[0] is a data signal. The instances with the same control inputs and similar data inputs are grouped together.
  2. Output signal name: The instances whose outputs drive the same bus are grouped together. For example, the instances of templates T2 and T4 are grouped together, since their outputs form the bus g[3:0].
  3. Circuit topology: Two template instances are grouped in the same vector, only if one of them is not in the transitive fanin of another. This heuristic will ensure that the template T1 (Figure 4) would be partitioned into at least two simple vectors, since two of its instances are in the transitive fanin of two other instances.

Figure 5: Schematic of example circuit obtained after forming vectors from the templates of Figure 4

Figure 5: Schematic of example circuit obtained after forming vectors from the templates of Figure 4

3.3. Schematic Generation

A schematic of the datapath circuit is generated using the vectors identified earlier and the control/data assignment to the signals. The schematic for the example circuit is shown in Figure 5. The schematic is essential to allow designers to control the design process: (a) they can get a much better understanding of the circuit than they could from the HDL description; (b) they can modify the design hierarchy and floorplan by merging/breaking templates or vectors, changing the control/data orientation of signals, and modifying the order of vectors. An example of such a modification is merging templates T2 and T1 to form a larger template with three instances, which might lead to better optimization during subsequent steps.

3.4. Technology Mapping

The input to technology mapping is the set of datapath vectors and the I/O timing requirements in terms of input arrival times and output loads. The partition of the circuit into vectors (or underlying templates) allows the designer to select a desired technology for each template independently. Currently, our synthesis flow assumes that the mapping of templates is performed manually, which can be easily automated due to the small size of templates. We explain several choices for mapping of templates.

Figure 6: Several mappings of template T1 (Figure 4) to static and dynamic logic

Figure 6: Several mappings of template T1 (Figure 4) to static and dynamic logic

  • Static logic: The traditional approach of mapping a circuit to static logic first decomposes the circuit into smaller directed-acyclic graphs (DAGs) [5, 7] and then independently maps each DAG to the specified library of static cells. In our case, the templates are small enough to be mapped directly without any decomposition. Figure 6 illustrates two mappings of template T1 (Figure 4) to static logic. Here, mapping 2 is suitable for template T1 in vector V3 of Figure 5, since one of the data input signals arrives later than the other. On the other hand, mapping 1 is suitable for T1 in vector V1 (Figure 5). Thus, depending on the template usage in the circuit, we might have to use several mappings of a template.
  • Dynamic logic: A template can be mapped to dynamic logic to achieve better timing; however, noise-related issues have to be considered, such as the length of the input and output signals of the template. Figure 4 also shows a mapping of template T1 (Figure 4) to dynamic logic. We are looking into automating the mapping of templates to dynamic logic.
  • Macro cells: Datapath circuits employ commonly occurring logic blocks, such as incrementers, adders, shifters, etc. A library of various mappings of these specialized datapath blocks for a range of area, performance, and power values will be required. For example, the incrementer in the example circuit of Figures 3 - 5 can be replaced by one of its mapped versions prior to extracting regularity from the HDL description; our synthesis flow would then result in vectors V1 and V3 shown in Figure 5, while V2 would correspond to a macro cell.

3.5. Gate Sizing

Once all the templates of a circuit are mapped to the desired technology, every gate is sized to satisfy the output load requirements. The output load capacitance of a gate comprises the following components:

  1. Gate capacitance: The capacitance values of the gates driven by this particular gate are available after the technology mapping step.
  2. Diffusion capacitance: The diffusion capacitance of the gate is also known after technology mapping.
  3. Interconnect capacitance: The capacitance is available only after the post-synthesis steps of floorplanning and RC estimation. Therefore, interconnect capacitance is used only in the gate-sizing step in the subsequent design iterations.
  4. Primary output load capacitance: The load capacitance is already specified for the primary outputs of the circuit.

The sizing of the gates of the mapped circuit is performed starting from the primary outputs and traversing back to the primary inputs, where the output load requirement is satisfied for every gate encountered. If there are loops in the circuit, then the gate sizes will take a few iterations to converge.

Different instances of a template mapping will be sized differently depending on the output load requirements. In general, a template with multiple instances can have several mappings, where each mapping can have several different gate sizes.

Gate sizing is performed again after the interconnect capacitance values are obtained from the floorplanning and RC estimation steps.

3.6. Results

While the steps of technology mapping and sizing are still under development, we have implemented prototypes for regularity extraction and vector identification. We list below the results of regularity extraction and vector identification on two datapath blocks in terms of the number of templates, vectors, and their instances.

Ckt. No. of components No. of templates (instances) Regularity index No. of datapath vectors
Block1 464 9(400) 2% 7
Block2 318 17(124) 12% 9

We have defined an index, called a regularity index, to evaluate the results of regularity extraction [13]. The regularity index is defined as the percentage of the number of logic components in all the templates to the total number of logic components in the circuit. The regularity index correlates to the reduction in the design effort, assuming that a template is not synthesized multiple times for its multiple instances.




Previous Next     Page 4 of 12