Danny Rittman
Introduction
Increasing power densities, cost pressure and an associated greater utilization of the robustness of modern semiconductors are making electro-thermal optimization more and more important in relation to electrical optimization. The growing packing density and power consumption are significantly affecting design’s electro-thermal behavior.
The increasing variability of key process parameters in nanometer CMOS technologies has resulted in larger impact of the substrate and metal line temperatures on the reliability and performance of the devices and interconnections. Recent data shows that more than 50% of all IC failures are related to thermal issues.
An Electro-Thermal Aware Layout Design can be implemented early in the design stage and therefore create Thermal Friendly Layout by Construction. Starting with Library Cells design which are the basic layout blocks, we can achieve block level, homogenous thermal dissipation and therefore less current consumption and better performance.
This article presents a brief discussion of key techniques addressing Electro Thermal improvements of library cells layout structures in order to achieve Thermal Aware Design. In addition a software tool development is proposed in order to perform an automatic standard cells modification to create Automatic Thermal-Efficient layout system.
General
Today’s electronic designs are constantly driven by the demand to reduce in size and provide faster performance. However, they also translate into high power densities, higher operating temperatures and reduced reliability. Furthermore, local hot spots, which have much higher temperatures compared to the average die temperature, are becoming more prevalent in VLSI circuits. Elevated temperatures are a major contributor to lower semiconductor performance and reliability.
If heat is not removed at a rate equal to or greater than its rate of generation, junction temperatures will rise. Higher junction temperatures reduce mean time to failure (MTTF) for the devices. Better thermal design for semiconductor will directly impact the overall system performance and reliability. Designing a cost competitive power electronics system requires careful consideration of the thermal domain as well as the electrical domain. Over designing the system adds unnecessary cost and weight; under designing the system may lead to overheating and even system failure. Finding an optimized solution requires a good understanding of how to predict the operating temperatures of the system’s power components and how the heat generated by those components affects neighboring devices.
No single thermal analysis tool or technique works best in all situations. Good thermal assessments require a combination of analytical calculations using thermal specifications, empirical analysis and thermal modeling. The art of thermal analysis involves using all available tools to support each other and validate their conclusions.
Approach
This paper first presents an initial research of automatic electro thermal improvement system and then describes some of the IC layout techniques to improve thermal dissipation within standard cell level and then within a block level. Addressing the thermal aspect early during the design stages, within standard cells, leads to overall block/chip level thermal reduction and uniformity. This type of approach works well for all type of designs and particularly for power devices, but the concepts herein are general and can be applied to any lower power components as well.
The system’s concept is to create an automatic system to analyze standard cells layout structure for thermal behavior and then perform physical modifications in order to improve the standard cell thermal characteristics. The system includes advanced geometric engine to read the process design rules and connectivity information and perform process related, structural analysis for possible thermal improvements.
The system has the capability to read external thermal information to be used as reference but works independently according to internal guidelines. The system’s results are thermal efficient standard cells layout structures, DRC, LVS clean. In addition the system maintains the cell’s I/O and timing characteristics.
IC Layout Thermal Engine, a major task in our today’s geometries
Today’s Electro Thermal methods face an increasing set of challenges to their ability to maintain speed and results accuracy for nanometer designs. Increasing complexity and integration, shrinking geometries and consequent physical effects have been some of the ongoing challenges of the nanometer era. The importance of first-pass silicon is a critical requirement in our today’s IC design market.
Each advancement in process technology results larger numbers of process layers and transistors, requiring more thermal design and manufacturing rule checks before manufacturing handoff. This results dramatic increase in thermal check runtime and increase in violations analysis, threatening tape-outs deadlines and time to market. The size and complexity of modern designs add another crucial challenge to the current thermal verification technology. Designers need more effective and faster thermal improvement capabilities in order to deliver accurate results and better performance.
One of the major challenges of automatic electro thermal correction of IC layout structures is maintaining the process design rules. Particularly in small processes like 10nm and below any modification that is done to the layout creates a wide series of design rule verification.
Another major task is to maintain accurate connectivity information (LVS) of the standard cell. Finally, all timing and electrical characteristics need to be maintained in order to ensure proper modeling and electrical behavior.
Few methods are suggested to address the standard cell accurate structure while improving its electro thermal characteristics.
Method one, Bitmap methods were widely used in early approaches. The layout is digitized into a grid of square cells, with each mask layer represented by a separate bit in each cell. Bit maps are attractive because of the simplicity of some operations such as Boolean operations (AND of two masks, for example) and space/width checking. Many bitmap approaches are based on Baker’s algorithm. There are, however, disadvantages to bitmap approaches. The first one is that bitmap approach requires processing a large amount of data; this requires large amounts of memory bandwidth an
d high parallelism in order to produce results with acceptable performance. The second disadvantage is that in a design system where the grid spacing (i.e., the minimum feature spacing) is much smaller than the minimum feature size, we need a much larger window size to check for width or spacing errors. If this is the case, significant time will be spent comparing error templates with windows.
Second method, Edge-based approaches, on the other hand, uses edges to represent regions in each mask layer. This helps to reduce the amount of data needed in general and is less dependent on the mask resolution. Kane and Sahni proposed a one-dimensional systolic architecture for design rule checking for Manhattan structures, with only horizontal and vertical features. The edge files are divided into horizontal edges and vertical edges and each set is processed independently. Horizontal edges are checked for vertical width errors and horizontal spacing errors; vertical edges are checked for horizontal width errors and vertical spacing errors.
The main disadvantage of this approach is that it requires non-deterministic amounts of hardware. The length of the systolic array is roughly determined by the number of horizontal edges along a horizontal line and this number varies significantly among different parts of the whole design.
Most of the edge-based approaches instead use variations on scanline algorithms. The mask layer data are transformed into an edge file that contains all non-vertical edges in the mask. Each edge is described by a pair of points, (Xmin, Ymin) and (Xmax, Ymax) along with an orientation field indicating if the edge borders the region from above or below. Additional layer information is needed when multiple layers are handled together. The edges are sorted in a canonical order, by non-decreasing order of slope within ymin and xmin, before processing.
A vertical scanline sweeps horizontally across the whole mask and only stops at the X-coordinates of all the edge endpoints. Edges that intersect the current scanline are then processed. This approach requires less hardware than previously-discussed approaches. Analog, RF and other exotic design types require non Manhattan structures analysis which is done using the Kirkpatrick–Seidel algorithm, called by its authors “the ultimate planar convex hull algorithm”.
This is an algorithm for computing the convex hull of a set of points in the plane, with O(n log h) time complexity, where n is the number of input points and h is the number of points in the hull. Thus, the algorithm is output-sensitive: its running time depends on both the input size and the output size. The basic idea of the algorithm is a kind of reversal of the divide-and-conquer algorithm for convex hulls of Preparata and Hong, dubbed as “marriage-before-conquest” by the authors.
The traditional divide-and-conquer algorithm splits the input points into two equal parts, e.g., by a vertical line, recursively finds convex hulls for the left and right subsets of the input, and then merges the two hulls into one by finding the “bridge edges”, bitangents that connect the two hulls from above and below.
The Kirkpatrick–Seidel algorithm splits the input as before, by finding the median of the x-coordinates of the input points. However, the algorithm reverses the order of the subsequent steps: its next step is to find the edges of the convex hull that intersect the vertical line defined by this median x-coordinate, which turns out to require linear time.
The points on the left and right sides of the splitting line that cannot contribute to the eventual hull are discarded, and the algorithm proceeds recursively on the remaining points. In more detail, the algorithm performs a separate recursion for the upper and lower parts of the convex hull; in the recursion for the upper hull, the noncontributing points to be discarded are those below the bridge edge, while in the recursion for the lower hull the points above the bridge edge are discarded.
The combination of complex rules, more transistors on a die, high-performance requirements and extra levels of wiring presents a formidable challenge to today’s design rule verification technology. Creating an Automatic Electro Thermal correction system for standard cells enables better heat dissipation early during the construction of IC layout.
DRC Categories; Digital, AMS, RF
Mixed Signal, Analog, and RF designs falling into a distinctive design rule category and require special requirements. With Analog designs a simple example is matched devices. Analog circuits rely on matched devices having the same electrical characteristics within a very tight tolerance. This requires an intensive layout design work more than matching transistor widths and lengths. To minimize manufacturing variances, matching devices must be laid out close together.
The devices must also be placed symmetrically and in a similar local setting to evade excessive lithographic distortions and directional manufacturing process differences. Special rules are required and one of the main difficulties is to detect the devices for analysis. Proximity and symmetry can be checked using traditional rules but unique, non-Manhattan structures are required complex Boolean analysis and connectivity related considerations.
There are numerous other DRC connectivity-dependent rules that need to be verified. These can generally be characterized as methodology checks. They are often specific to one design type and vary from Analog to advanced RF designs. One example is decoupling capacitors. Other examples are coils, special resistors and other unique layout structures that are required special treatment.
Using an on-the-fly DRC check system can save significant design time and improved the layout quality majorly and achieving much higher yield. During early design stages unique layout structures can be constructed in better ways creating high quality design that is DFM compliant.
Maintain Design Rule Correction while fixing Thermal Characteristics
Modifying layout structures for Electro Thermal improvements, maintaining design rule correctness is a complex challenge. Today’s industry standard tools are in a constant race to catch up with the latest manufacturing processes, design requirements and the new, challenging physical phenomenon.
The suggested system provides new technology using Real-Time DRC analysis. This method is targeting fast and accurate, on-the-fly DRC correctness as the structure is been modified. Technically, one method is addressing DRC for Manhattan structures. Another approach is addressing DRC analysis for non-Manhattan structures. The second approach is aimed for increases complexity structures and design styles.
DRC correctness for non-Manhattan based structures is based on the fact that wires can intersect at arbitrary angles, it requires multiplications and divisions to calculate intersecting points. The suggested system uses derived version of advanced scanline algorithm to address Edge-Based structures. There are several considerations within this algorithm:
1. Edge-based processing leads to a fairly wide datapath. For Manhattan structures, we need at least 3 coordinate fields (Xmin, Xmax and Y) for each edge: two X-coordinates for beginning and ending points and one Y-coordinate for the edge.
Scanline rendering is an algorithm for visible surface determination, in DRC analysis its implementation works on a row-by-row basis rather than a polygon-by-polygon or pixel-by-pixel basis. All of the polygons to be rendered are first sorted by the top y coordinate at which they first appear, then each row or scan line of the image is computed using the intersection of a scan line with the polygons on the front of the sorted list, while the sorted list is updated to discard no-longer-visible polygons as the active scan line is advanced down the picture. (Figure #1, 2)
Figure #1: Scan Line Approach
2. Complex design rules require the generation of intermediate (or “derived”) layers that are Boolean transformations of true layers. When a derived layer is generated in an edge-based algorithm, its edge file is not in canonical order, since the edge with the smallest X-coordinate of its ending point leaves the pipeline first. Thus an intermediate sort is needed.
3. One of the key strength of the suggested system approach and algorithm is addressing a full edge-reconciliation. A general method to obtain computational parallelism in DRC is to cut each mask layer into pieces and process each piece individually. This is possible both in bit-map approaches and edge-based algorithms. Spurious width/spacing errors, however, could be generated by the DRC system because some regions are split into pieces. Sewing regions back together imposes nontrivial performance overhead (especially for edge-based approaches) when the mask is divided into many small pieces.
The system’s approach assumes a virtual scanline that passes over the mask layers, checking relevant design rules as it goes. As the scanline passes over the mask, it takes note of features like edges and their endpoints. In order to check vertical spacing or widths, the scanline will be oriented vertically, as shown in Figure #2 and will scan horizontally across the mask. In order to check horizontal spacing or widths, the mask is rotated by 90 degrees counterclockwise and the scanline passes over it again.
Our method uses an edge-endpoint-based representation. All the edges in a mask layer are categorized into horizontal edges and vertical edges and their endpoints are processed separately. By doing this, we can solve the problems mentioned above. From this point on, we will only talk about how to process the endpoints associated with horizontal edges. Endpoints associated with vertical edges will be processed in exactly the same way after the mask is rotated.
Figure #2: Scan Line Analysis
Thermal Characterization and Analysis
Our system characterizes thermal behavior according to specific guidelines and rules. An external thermal analysis results can be also read, as supporting data. As violations are defined as the initial step, the system learns the physical structure, and building a thermal improvements database.
When a request for thermal analysis is done by the designer the system performs a series of initial analysis in order to categorize the types of geometrical cases that were submitted. Each case is examined according to its geometrical, connectivity and DRC attributes. The system catalogs each case according to its type and nature. Mixed cases are categorized in a special category to be handled within wider ranges of analysis later on.
Analysis – After full categorization of the submitted area to be tested for thermal improvements, the system is performing the actual electro thermal possible modifications analysis. Few methods are used to analyze IC layout for a wide variety of changes maintaining the process design rules. These methods are divided according to category’s type.
For example, the CONNECTIVITY category requires a specific method to identify electrical connections between polygons according to the correlated rules. As the system analyzes the IC layout region for CONNECTIVITY it is considering the GEOMETRICAL rules since both categories have to comply with the rules. These types of layout cases are considering extra complex due to the wide range of considerations and constraints that the tool has to check, in real time.
DFM design rules belong to a separate category of geometrical analysis and therefore a dedicated method is addressing them separately. Design rule check is performed for all IC layout objects of the selected area in order to maintain thermal and process correctness.
For example, the typical space check of a specific layer is equivalent to width check on the inverse of such a layer. Other types of checks, such as minimum overlap, can all be transformed to width check in a similar way. This does not mean, however, that we have to transform the design to these intermediate forms before checking these rules. Instead, when multiple layers are presented, all rules and derived rules that are associated with these layers will be checked simultaneously. Design rule check is also defined as a local problem in adjacent regions.
This locality is utilized by means of pattern matching on a two-dimension area. In an edge-based approach, this locality is utilized by processing only adjacent true edges along advanced, proprietary sweep line mechanism. This is also true in our edge-endpoint-based approach. Here adjacency is not only refers to the adjacency within one layer, but among different layers and derived layers that are related via conditional rules as well.
The suggested system is able to find all possible Electro Thermal improvements maintaining design rule and connectivity correctness using few types of sweep line based algorithms. In this way all type of geometries can be analyzed. This approach was developed in order to cover complex design types like Analog, AMS and RF design styles.
The system identifies all vertical, horizontal, angled and conic based layout objects using fast accesses database engine. After rapid identification of the objects they are fed into the geometrical engine for processing according to their attributes and characteristics. An example is demonstrated in Figure #3. The system’s geometric engine defines a virtual analysis box that has surrounding four objects. That is, if we expand the virtual analysis box by the minimum space d, the expanded region (The dotted box) will intersect all four objects.
The geometric engine then performs a series of computations in order to analyze few design rules categories starting at the basic level of minimum spacing and width and ending with DRC and CONNECTIVITY, according to the relevant cases. The geometric engine performs a wide range of Boolean and Metric operations, on the fly, achieving accurate analysis in real time. The system then presents to the designer a series of error markers and highlighted zones, guiding to construct an accurate, design rule correct IC layout.
During the analysis intermediate layers are needed for advanced design rule check. For example, a Length dependent rule is required for a METAL layer and is usually involved with many design rules that must correlate together. Unique derived layers are generated via Boolean and/or metric operations for checking several desired design rules. Thus object points can be selected and analyzed using the typical edge based techniques using derived layers method significantly increases the capabilities of complex rules cases.
The system’s geometric engine incorporating few types of algorithms and methods to identify, categorize and analyze wide range of Electro Thermal possible corrections maintaining design rules and connectivity correctness. We will mention here five main methods and approaches.
Dot Throwing
In order to identify physical structures for better Thermal behavior simulated dots are generated in the layout, and the effect of each dot is tallied. This process is repeated a sufficient number of times to produce an acceptable estimate of object’s area, using Monte Carlo error analysis.
Size-based Operations
Using a combination of Boolean operations and DRC-type sizing operations overall polygon area is calculated. This algorithm requires one sizing and one Boolean operation for each polygon size. Since maximum object sizes are typically large, this algorithm is very compute intensive; sizing polygons in a hierarchical layout by large amounts essentially destroys the underlying layout hierarchy. Both Boolean and sizing operations are based on scan-line or other sweep line derived algorithm.
Voronoi Diagram
This method is based on calculation of Voronoi diagrams for each design layer; these diagrams are a partitioning of the plane encoding nearest-neighbor information of the layout shapes on a layer. Constructing the Voronoi diagrams and calculating objects area can both be performed by use of a single scan-line algorithm for example.
The critical area curve is computed completely, for all polygon sizes. This approach can also cover connectivity-based area calculations, and sampling. Accounting for connectivity strongly increases accuracy. For instance, when two shapes are “opened”, connectivity analysis will determine whether both shapes do not cause any open due to redundancy; cause a single open because the two shapes belong to the same net; or cause two opens because the two shapes belong to different nets.
Delaunay Diagram
The suggested system includes a new algorithm for constrained Delaunay triangulation. This new algorithm is built upon sets of points and constraining edges. It has various applications in geographical information system (GIS), for example, iso-lines triangulation or the triangulation of polygons in land cadastre. Our algorithm uses a sweep-line paradigm combined with Lawson’s legalization. An advancing front moves by following the sweep-line.
It separates the triangulated and non-triangulated regions of interest. Our algorithm simultaneously triangulates points and constraining edges and thus avoids consuming location of those triangles containing constraining edges, as used by other approaches.
The implementation of the algorithm is also considerably simplified by introducing two additional artificial points. Experiments show that the presented algorithm is among the fastest constrained Delaunay triangulation algorithms available today.
The geometric engine is targeted to handle a wide range of design rules categories. Among these categories are geometrical rules, connectivity-based, DFM, reliability rules.
One of the most effective algorithms that are incorporated within this technology is a combination of Delaunay and Voroni approaches. (Figure #4) This innovative, combined approach enables high accuracy (Voroni) along with high performance (Delaunay) algorithm to provide fast results for interactive design rule check.
Fortune’s Algorithm
Another approach that is used as a derived development, is Fortune’s Algorithm. Fortune’s algorithm is a sweep line algorithm for generating a Voronoi diagram from a set of points in a plane using O(n log n) time and O(n) space. The algorithm maintains both a sweep line and a beach line, which both move through the plane as the algorithm progresses.
The sweep line is a straight line, which we may by convention assume to be vertical and moving left to right across the plane. At any time during the algorithm, the input points left of the sweep line will have been incorporated into the Voronoi diagram, while the points right of the sweep line will not have been considered yet.
The beach line is not a line, but a complex curve to the left of the sweep line, composed of pieces of parabolas; it divides the portion of the plane within which the Voronoi diagram can be known, regardless of what other points might be right of the sweep line, from the rest of the plane. For each point left of the sweep line, one can define a parabola of points equidistant from that point and from the sweep line; the beach line is the boundary of the union of these parabolas.
As the sweep line progresses, the vertices of the beach line, at which two parabolas cross, trace out the edges of the Voronoi diagram.
The algorithm maintains as data structures a binary search tree describing the combinatorial structure of the beach line, and a priority queue listing potential future events that could change the beach line structure.
These events include the addition of another parabola to the beach line (when the sweep line crosses another input point) and the removal of a curve from the beach line (when the sweep line becomes tangent to a circle through some three input points whose parabolas form consecutive segments of the beach line). Each such event may be prioritized by the x-coordinate of the sweep line at the point the event occurs. The algorithm itself then consists of repeatedly removing the next event from the priority queue, finding the changes the event causes in the beach line, and updating the data structures.
As there are O(n) events to process (each being associated with some feature of the Voronoi diagram) and O(log n) time to process an event (each consisting of a constant number of binary search tree and priority queue operations) the total time is O(n log n). Our geometrical engine is using Fortune’s algorithms mainly for the geometrical modifications and design rule analysis. This method was furthered developed to accommodate polygons RET analysis.
Auto-Correction – During the extensive geometrical modifications, connectivity and DRC oriented analysis is performed on-the-fly, creating a solution category. In this category the tool stores correction solutions for each thermal enhancement that was created. These solutions are the results of the analysis process that was mention in the previous paragraph.
The designer is approached at end of the analysis stage with a suggestion to apply thermal improvement solution, DRC, LVS clean. It is the designer’s choice to implement or ignore the suggested Auto-Correction. The suggested solutions are geometrical, connectivity and DRC compliant.
Interactive Mode
The suggested Electro- Thermal Auto Correction system provides an analysis and Auto-Correction option. Since advanced processes require the support of electrical connectivity, physical phenomenon and manufacturing considerations, it creates a serious challenge for an interactive technology to analyze these categories at the same time. Given the fact that the analysis results should provide a user’s alert less than one tenth of a second makes it almost impossible to achieve with current approaches and methods.
In order to maintain Real Time, Interactive mode analysis and modification the next three (3) sub systems will be developed.
Database Engine – Some database applications require performing a series of actions where only some actions must be performed on some database items before others. Before these actions can be performed on this set of required items, we must work out a certain sequence that takes into account all of the dependencies.
Using a Directed Acyclic Graph-based algorithm we developed a method to quickly access each layout object and its necessary geometric or DFM-related properties (Like: Layer, Size, DRC space for Oversize/Undersize and more). A directed acyclic graph is a directed graph with no directed cycles.
That is, it is formed by a collection of vertices and directed edges, each edge connecting one vertex to another, such that there is no way to start at some vertex v and follow a sequence of edges that eventually loops back to v again. This type of graph may be used to model several different kinds of structure in mathematics and computer science.
The new database engine is a key factor for the entire system’s fast operation due to the fact that in large integrated circuits an immediate access to a specific layout region for DRC analysis is needed. The database system must provide the layout objects within less than millisecond time frame. Given the fact that today’s chips included billions of transistors this task is extremely challenging.
Real Time System – A real-time system is typically composed of several tasks with timing constraints. This timing constraint of an operation is usually specified using a deadline, which corresponds to the time by which that operation must complete.
Thus a proper method is essential to ensure that all activities are able to meet their timing constraints. Selecting appropriate methods for scheduling activities is one of the important considerations in the design of a real-time system.
Scheduling of real-time systems has always assumed that the underlying hardware is a single processor system. But current trends indicate that sophisticated real-time applications with high computational demands are emerging. Examples of such systems include automatic tracking systems and telepresence systems.
These applications have timing constraints that are used to ensure high system fidelity and responsiveness, and may also be crucial for correctness in certain applications such as telesurgery. Also, the processing requirements of these systems exceed the capacity of a single processor, and a multiprocessor may be necessary to achieve an effective implementation.
These observations underscore the growing importance of multiprocessors in real-time systems.
The priority of the task to be scheduled may be fixed or dynamic. The rate-monotonic scheduling (RMS) algorithm is an optimal fixed-priority scheduling algorithm. An optimal dynamic-priority algorithm is the earliest-deadline first (EDF) algorithm.
These scheduling algorithms which are optimal in single processor systems will result in arbitrarily low processor utilization in multiprocessor systems.
When dealing with IC layout design speed of operations is paying a key role. The designer is pressured by tight schedules and milestones to complete the layout block(s) in order to achieve the target tapeout schedule.
Adding an interactive Electro Thermal Analyzer and enhancement system, is a highly desired functionality but only if it does not interrupting the designer’s work. If the interactive tool slows the system due to its background calculations and computations it will not be used.
Therefore an efficient real time system is necessary in order to make the technology an advantage for use.
Geometrical Engine – Is a library of advanced geometric routines, used by higher-level modules for performing their tasks. This module includes state-of-the-art algorithms that are proprietary and have never been published in scientific literature.
These geometrical functions and routines addresses a wide variety of IC layout data according to its type, taking into consideration the process’s rules and constraints. This engine supports simple and highly complex topological scenarios according to design types and manufacturing rules. One of the key advantages of this engine is its capability of case learning. Each new analyzed case is studied and recorded for future similar cases.
In the next event of encountering a similar scenario the system will analyze and suggest a set of solutions based on its previous experience.
Proprietary Rule Deck; A whole world of difference
One of the key strengths of the system is its own rule deck system. This feature enable to define our own proprietary Electro Thermal rules to support complex, geometrical, connectivity and thermal definitions and constraints.
The rule deck is Tcl compatible, ASCII based. Adopting Tcl language enables robust capability to describe thermal geometrical and connectivity based rules as required by advanced nanometer processes. This format provides flexible ability to illustrate complex rules including error-handling schemes and validation. Furthermore, using private rule deck users can easily define thermal rules for debugging purposes, to be checked interactively.
The tool robust rule deck system opens a whole world of possibilities creating, editing and interactively testing of custom thermal conditions. Customers are capable to test new thermal scenarios, use a set provided by the fabrication plants or modify rules using the tool’s user’s friendly GUI.