Vertical Chargé Transport 1 in Junction Charge-Coupled Devices

(1)

■S^H,-*ë" n S S ? ^ ~ _{, t.. j . - . W / *-j, ,*r 7} v "'

1 Vertical Chargé Transport

1 in

Junction Charge-Coupled Devices

(2)

(3)

111) eKf«J

Vertical Charge Transport

in

(4)

(5)

Vertical Charge Transport

in

Junction Charge-Coupled Devices

Proefschrift

Ter verkrijging van de graad van doctor aan de Technische

Universiteit Delft, op gezag van de Rector Magnificus, prof.

dr. J.M. Dirken, in het openbaar te verdedigen ten overstaan

van een commissie aangewezen door het College van Dekanen

op dinsdag 6 oktober 1987 te 14.00 uur door

Cornells Leonardus Maria van der Klauw

(6)

Dit proefschrift is goedgekeurd door de promotor

prof. dr. M. Kleefstra

(7)

(8)

1 Introduction

The advances in semiconductor technology in the past decade have led to the single chip integration of complex signal processing systems. Nowadays, digital circuits containing well over 100,000 transistors are being fabricated. It is not only the decrease in feature size which has made this possible, but also the increase in active chip area thanks to a better controlled fabrication technology. This increase in component count has been a major factor in the realisation of complex digital signal processing chips.

Another factor is the improvement in the electrical properties of the individual components. Transistor switching times and power dissipation have been reduced drastically, resulting in higher operating speeds for given architectures. However, it is the architecture itself that is the third factor which determines the performance of a digital integrated circuit, and which is becoming more important.

In order to perform a certain operation on digital data, such as a mul tiplication, a number of algorithms can generally be applied. Early signal processing employed a single processor with a relatively large instruction set, working on successively applied data. In this case, the architecture of the hardware is, to a large extent, independent of the algorithm that is carried out by the processor. More dedicated signal processing circuits con sist of many processing elements working in parallel. Since the individual processing elements have to perform only a few different operations, they can be fast and relatively simple. The architecture of the chip is currently more related to the algorithm. However, large circuits with a large logical depth exihibit a substantial delay from input to output, during which time no other input signals can be applied.

An elegant way to cope with these problems is to apply a systolic array [l]. This consists of a regular grid of identical processing elements, each connected to only its nearest neighbours and all performing logic operations simultaneously. In this way, a very high degree of pipelining and parallel-lism is achieved. Each cell, however, requires local memory and this will contribute to the area consumption. Consequently, there is a search for technologies with an inherently small memory cell size for the realisation

(12)

of systolic arrays [2].

Charge-Coupled Devices (CCD's) provide such small memory cells and furthermore have a very low power dissipation [3]. Moreover, the clocking scheme of CCD's fits almost naturally into systolic architectures. Therefore, Charge-Coupled Devices appear to be excellent candidates for the imple mentation of systolic arrays, provided efficient logic functions can also be realised in the same process. It is this need for CCD logic circuitry which has initiated the research described in this thesis.

In chapter 2, the concept of the systolic array is highlighted and the operation of digital Charge-Coupled Devices (DCCL) is discussed [4]. A brief overview is given of the results obtained by several authors in this field, all of whom used surface channel Charge-Coupled Devices, which limited the operating frequency.

In chapter 3, the physics and fabrication technology of Junction Charge-Coupled Devices (JCCD's) are discussed. This type of CCD was first de scribed by Schuermeyer et al. [5], while the first experimental results were reported by Kleefstra [6]. Junction Charge-Coupled Devices (JCCD's) are essentially buried channel CCD's and are capable of high speed operation. In addition, J C C D ' s offer unique possibilities for vertical charge t r a n s p o r t through the steering gates. However, the minimum cell size is somewhat larger than that of their MOS counterpart. The research objective is there fore to investigate the feasibility of logic functions using Junction Charge-Coupled Devices, based on vertical charge transport, a feature which has not been analysed in detail. As a consequence, this thesis is concerned with the physics and application of vertical charge flow, rather t h a n with logic systems. Chapter 3 continues with a discussion on vertical charge injection in J C C D ' s which was previously applied in analog filters b u t which is also suited to high speed multiplexing of charge signals, such as given by nuclear physics detectors.

Chapter 4 deals with the analysis and applications of vertical charge flow out of the J C C D (overflow) which takes place whenever surplus charge is present in the channel at a sufficiently high clock voltage. This surplus charge activates the substrate P N P transistor which is inherently present in Junction CCD's. Transistor theory has therefore been added to the conventional electrostatic description of J C C D ' s [7]. The process of charge overflow is discussed in detail, and the results of simulations on the circuit

(13)

level a n d the device level, as well as experimental results,: are presented. The application of the substrate transistor in a simple, linear output is also discussed.

Finally, in chapter 5, Junction Charge-Coupled Logic (JCCL) is intro duced, based on t h e previously described principles of vertical charge flow in Junction CCD's. T h e layout and behaviour of basic logic gates is discussed and a full adder circuit is presented. Although the operating frequency of JCCL devices has been considerably higher t h a n other CCD logic circuits, the present circuits suffer from a poor noise margin and a relatively high power dissipation. Measures t h a t can strongly improve the performance of J C C L circuits are discussed.

T h e appendix includes an outline of the device analysis program SETRA, which was used t o investigate the behaviour of excess charges in the Junc tion C C D .

(14)

References

[1] H.T. Kung, 'Why Systolic Architectures ?', IEEE Computer, vol. 15, no. 1, pp. 37-46, Jan. 1982

[2] J.G. Nash, 'Combinatorial Digital Logic Using Charge-Coupled De vices', IEEE Journal of Solid-State Circuits, vol. SC-17, no. 5, pp. 957-963, Oct. 1982

[3] C.H. Sequin and M.F. Tompsett, 'Charge Transfer Devices', Academic Press, New York, 1975

[4] T.A. Zimmerman, R.A. Allen and R.W. Jacobs, 'Digital Charge-Coupled Logic(DCCL)', IEEE Journal of Solid-State Circuits, vol. SC-12, no. 5, pp. 473-485, Oct. 1977

[5] F.L. Schuermeyer, R.A. Belt, C.R. Young and J.M. Blasingame, 'New Structures for Charge-Coupled Devices', Proc. IEEE, vol. 60, pp. 1444-1445, Nov. 1972

[6] M.Kleefstra, 'First Experimental Bipolar Charge-Coupled Device', Mi croelectronics, vol. 7, pp. 68-69, Dec. 1975

[7] G.C. Herman, C D . Hartgring and M. Kleefstra, 'Calculation of Poten tial Profiles in the Junction Charge-Coupled Device', IEEE Trans, on Electron Devices, vol. ED-25, no. 7, pp. 845-847, July 1978

(15)

2 C h a r g e - C o u p l e d Devices in logic circuits

2.1 Introduction

In this chapter, the application of Charge-Coup led Devices in logic circuits will be regarded from two viewpoints : the device viewpoint and the system viewpoint. First, Charge-Coupled Devices will be introduced and their operation explained, with emphasis on the characteristics of different types of CCD's, as far as they are important for logic circuits. The most obvious logic application of CCD's is found in memories, in which information is represented by charge packets in two or more discrete quantities. In general, a logical ' 1 ' corresponds to a maximum charge packet, while a logical '0' is represented by the absence of charge, called an empty packet.

. This representation is also employed in binary CCD logic. The basic digital CCD logic gates will be presented as they are implemented with MOS surface channel CCD's and their characteristics will be discussed in terms of speed, power dissipation, and area consumption. So far, the MOS surface channel technology has been the only one in which digital CCD logic gates have been realised, which is mainly because of the relatively simple processing compared with buried channel CCD's. Two alternative approaches have been demonstrated in practice. One way is to increase information density by distinguishing more than two logical levels in the charge packets, the so-called multiple-valued logic. The other approach is to combine Charge-Coupled Devices with conventional n-MOS circuitry, which exploits the advantages of both.

The way in which the MOS surface channel CCD logic gates differ from the logic gates based on Junction Charge-Coupled Devices, which form the main subject of this thesis, is briefly described.

The systolic array will be presented as an attractive architecture for carrying out digital signal processing. In a systolic array, a large number of processing elements are working in parallel while the highest degree of pipelining is achieved. Therefore, the algorithm which is carried out by the systolic array is closely related to the architecture of the hardware, al though systolic algorithms may be verified on conventional computers. The

(16)

r//'//A v/)//i

gates

drain

p i Y/>/A V/>/A V/>/A N

; oxide p-substrate

Figure 2.1: Cross-section of a MOS surface channel CCD. individual processing elements consist of random or structured logic circuits with additional memory and synchronisation, or the systolic concept may be carried through in the processing elements themselves. The lowest level at which the systolic array can be arranged is the bit-level systolic array, in which only one bit is handled in each processing element at a time. It is this bit-level systolic architecture that fits almost naturally into the operation and implementation of Charge-Coupled Devices.

At the end of this chapter, the strong coherence between the operation of bit-level systolic arrays and Charge-Coupled Devices will be recapitulated.

2.2 C h a r g e - C o u p l e d Devices

A cross-section of a surface channel Charge-Coupled Device (SCCD) is given in fig. 2.1. It consists of a number of closely spaced (in practice overlapping) electrodes, called gates, on a common p-type substrate, all of which are insulated by a thin oxide layer. An input and output diode, the source and drain respectively, are formed by n+ diffusions.

With the substrate at ground potential, the majority carriers will be re pelled locally from the oxide-silicon interface if a positive voltage is applied to one of the gates. In this way, a depletion layer is created with a typi cal dimension of a few micrometres. For higher gate voltages, t h e steady state minority carrier concentration may become higher t h a n the equilib rium hole concentration and an inversion layer is formed with a thickness of typically 10 nanometres. The electrons in this inversion layer can be

(17)

supplied by thermal or optical generation in the bulk and at the surface, or electrically through the n+ regions at the ends of the SCCD. For a gate which is not immediately near the source or drain, only the first option re mains if adjacent gates are kept at ground potential. At room temperature and with no incident light, it can take tens of seconds for the inversion layer to be built up to the steady state situation by this dark current, depending on oxide thickness and fabrication process parameters. So a local poten tial m a x i m u m can be maintained at the silicon surface for some time and this potential well for electrons can be moved along the surface by apply ing overlapping voltage pulses to successive gates. This is depicted in fig. 2.2. Electrons can be present in the potential well, thereby decreasing the potential maximum at the surface, usually denoted by $5. As long as $„ does n o t drop below the potential under adjacent gates, the electron charge packet will be localised and it can be transferred as a whole. The m i n i m u m n u m b e r of independent gate voltages (clock phases) required is three for this simple CCD, b u t incorporating a preferential transport direction by technological means makes two phase charge t r a n s p o r t possible.

Charge can be brought into the CCD by fixing the source voltage and creating a potential well under the first gate, and it can be detected at the drain if its voltage is high enough to withdraw the electrons from the potential well under the last gate. Several sophisticated input and output structures exist, most of them concerned with the linearity of input and o u t p u t voltage-charge relationships. A detailed description can be found in textbooks on Charge-Coupled Devices [l] [2] [3].

Another class of Charge-Coupled Devices employs majority carrier trans port, i.e. electrons in an n-type layer which is locally fully depleted in order to separate individual charge packets. In these devices, the electrostatic potential maximum is some distance away from the surface, and this has a beneficial influence on charge transport. These CCD's are known as buried channel Charge-Coupled Devices (BCCD) and they include the Junction Charge-Coupled Device, described in chapter 3. A few fundamental char acteristics of Charge-Coupled Devices will be discussed next.

T h e charge handling capability of a CCD is the amount of charge t h a t can b e transported in a single transfer per unit of gate area, normally ex pressed in electrons per square centimetre. The charge handling capability

(18)

gates drain V/S//A V7777A 777771 V777 y77> 777771 V7777Z V//77J t = t, t = t , t = t , t = t,

r

^ÈÊÉÈÊT ^ÊÊÊÈT

1. r

r

%

**_{*/ w}**

i / "

- > i i i i i r

r

^iiiir

_»/ N m s s s ^ \mmmm

I

Figure 2.2: Schematic representation of three-phase charge transport in an

(19)

is dependent on the clock voltage swing and varies between 1011 and almost

101S e/cm2 in practical devices. The latter value holds for surface channel

devices; this maximum is determined by the breakdown electric field of the gate oxide, while the practical limit in buried channel CCD's is fixed by the silicon breakdown field. For SCCD's, the relation between the charge handling capability and the applied gate voltage is nearly linear since the surface potential <!>„ varies almost linearly with the gate voltage Vg and with

the signal charge Q :

*. = v; + v

0

- v/2v;v

0

+ v> (2.1)

with

V', = v„ ~ VFB ~ Q/Ci

and

Vo = qNAeti/C?

Here, VFB is the flat-band voltage of the MOS structure, C,- and (7,

are the constant oxide capacitance and the capacitance per unit of area, respectively, and V0 is the voltage across the depletion layer, which depends

on the substrate doping N^. Furthermore, q and ea< are the electronic

charge and the silicon dielectric constant, respectively. The nonlinearity is determined by the square root whose influence may be small in practice, especially for low impurity concentrations of the substrate (1014 cm- 3) and

thin oxides (100 nm), resulting in small values for V0. The linear behaviour

is of great importance since it facilitates the implementation of (multiple-valued) logic circuits with SCCD's. In buried channel CCD's, the amount of charge varies nonlinear with the applied gate voltage and is dependent on technological parameters that are not easily controlled, such as the epilayer thickness and impurity concentration.

Another important figure of CCD's in general is the transfer inefficiency e, defined as the fraction of the total charge packet that stays behind after a single transfer. Charge losses that are independent of the amount of charge are usually denoted by 6. Incomplete charge transfer can be attributed to several causes. In surface channel CCD's the main cause is the presence of interface traps at the Si-Si02 interface and the limited speed of charge

(20)

transfer. In buried channel CCD's, other causes such as bulk t r a p s and parasitic potential wells, may determine the transfer inefficiency. Traps usually lead to fixed charge losses 6, only dependent on the number of empty charge packets (zeroes) that preceded the first nonzero signal charge at a given clock frequency. In SCCD's these losses can be reduced significantly by allowing a small.bias charge, the so-called fat zeroes. Typical values for e range between 1 0- 5 and 10~2, strongly dependent on processing.,

T h e dark current was already mentioned as the source for the inversion charge. It determines the maximum allowable time a local potential well can be maintained and the individual charge packets can be distinguished. For carefully processed CCD's, the average dark current is a few nA per square centimetre.

More fundamental is the difference in operating speed between SCCD's and BCCD's. The transfer of charge between adjacent potential wells is governed by three mechanisms:

• fringing fields induced by the clock • self-induced fields

• diffusion

Fringing fields arise from potential gradients in the channel in t h e direc tion of charge transport t h a t are a result of different voltages on subsequent gates and of technologically created gradients. A step-function-like gradi ent would only remove carriers that reached this step by other mechanisms. When the transport channel is some distance away from the gates, as is the case in buried channel devices, the voltage difference between gates causes a potential gradient that extends over a considerably longer distance t h a n in surface channel devices. In BCCD's this mechanism is the main factor that enables them to be operated at frequencies of several hundred MHz. Self-induced fields arise from the mutual repulsion of identical carriers. In SCCD's this mechanism dominates for large charge packets, but the last part of it, or small packets in general, will be transferred by diffusion and small fringing fields, which limits the operating frequency to a few MHz for gate lengths of about 10 fim, when low transfer inefficiencies are required.

(21)

floating I I V77Z?\

IZZ r - Q

%ZZ> Wm^M

Z Z

. , I H — gates

IZZI Z 3

IZZ IZZ

drain

Figure 2.3: Layout of a floating diffusion inverter.

2.3 Use of C C D ' s in memories

Immediately after their invention, it was apparent t h a t CCD's could be used as serially organised dynamic memories. In 1971, Engeler et al. [4], reported a 14-bit shift register based on the surface charge transistor,'which was essentially a Charge-Coupled Device with electrodes in two separate layers. Especially at high operating frequencies, the incomplete charge transport demanded refresh circuits to be incorporated in these delay lines. Later, such refresh circuits were used as well in CCD logic circuits as they are basically inverters. A refresh circuit is depicted schematically in fig. 2.3 [5].

A charge packet is transferred in the left CCD downwards t o a gate under which a diffusion is placed (n-type for n-channel devices). If this diffusion is left floating, it will take the surface potential $ , t h a t would have resulted without this diffusion. When an electron charge packet arrives at this point, the potential of this diffusion will drop. T h e potential drop can be nondestructively read out with a high impedance load, which is the input gate of the right CCD. As a result, the input transistor is switched off a n d no charge packet is brought in this CCD. Conversely, a logical ' 1 ' is generated if no charge arrives at the floating diffusion and the potential thus follows the surface potential. T h e input charge packet is drained. A

(22)

W77\ source

< I slove

drain V77A .

Figure 2.4: Layout of a Boating gate inverter.

limitation on the use of this structure is posed by the stray capacitance of the floating diffusion, which reduces the voltage swing caused by either a charge packet or the voltage on the gate on top.

Another charge regenerator, the floating gate amplifier, is shown in fig. 2.4. In this case the charge packet arrives at a so-called master gate. This gate is not directly connected to the clock, but can instead be linked up to a positive voltage V+ if the gate voltage Vg of the MOS transistor

is raised. The same holds for the connected slave gate in the other CCD channel. Before a charge packet arrives at the master gate, the master-slave combination is precharged to V+ and then left floating when Vg drops. At

the arrival of a full charge packet, the surface potential under the master is lowered along with the potential of the floating gates, master and slave. Consequently, the surface potential under the slave electrode is decreased, and this blocks the transport of a charge packet in the slave CCD which was supplied by the source. The original input packet is drained and an inverted signal is transferred in the right CCD.

\ZD\

(23)

With the aid of eq. (2.1), expressions can be derived for the change in surface potential of the slave for a given amount of charge in the master [6] [7]. For small charge packets, this relation is nearly linear but the range depends on the precharge voltage V+ and such geometrical factors as the

area ratio of the master and the slave electrode and the oxide thickness. Two possible ways to arrange CCD memories are shown in fig. 2.5 : the serpentine structure and the serial-parallel-serial structure (SPS). The first is merely a series of Charge-Coupled Devices with charge regenera tors inserted in between. In a SPS structure, the number of transfers has been reduced drastically at the cost of a more complex clocking scheme. Here, N bits of data are shifted in a relatively high-speed CCD. Next, this data is transferred to N parallel CCD's, each with M stages, running on an iV-times lower clock frequency than the serial input CCD. After the data has passed the parallel CCD's, it is read out by another serial CCD at high frequency. Every bit passes only N + M cells, compared with

N x M cells in a purely serial memory such as the serpentine structure.

The overall transfer losses can be so low that the charge regenerators are not necessary, due to the reduced number of transfers and the relatively low clock frequency in the parallel CCD's. Recently, a high density (300 kb) memory for video applications employing this concept has been reported [81.

Another approach, which increases the information density in CCD memories, is to distinguish more than two charge levels in CCD poten tial wells, which is called multi-level storage. Yamada et al. [9] constructed a CCD memory that employs four levels of charge: a fat zero and three equally distributed levels up to the maximum. The refresh circuits and the input and output circuitry are more complex than in the binary case, but since two bits can be stored in each potential well, an improvement in performance can be obtained.

2.4 Use of C C D ' s in logic circuits

As early as 1972, Tompsett suggested the elementary operations NAND and NOR based on charge regenerators in Charge-Coupled Devices [5].

(24)

V

-□ODD

4 1 D

-* regenerator inverting

- D O -

-out

^DDDDDD-a

D

D D

. D G

M-stages

□

DGDDD

f

D

—DDD

D

■ N x — □

• D

DDDD

-out

Figure 2.5: Two common CCD memory arrangements : (a) Serpentine structure, (b) Serial-parallel-serial structure (SPS).

(25)

a r -i i G? r — Gi, 1 i CÏ ab

Figure 2.6: CCD AND/OR logic gate; dashed lines : potential barriers. These charge regenerators employed a floating diffusion (fig. 2.3). However, there was no direct interaction of charge packets as the logic functions were realised by a series and parallel positioning of input gates, controlled by t h e presence and absence of charge at t h e floating diffusions. In the same year, Mok and Salama presented a structure to calculate the AND and OR function of two input charge packets by direct interaction, which is given in fig. 2.6 [10].

T h e dashed lines indicate a built-in potential barrier for charge pack ets, which means t h a t the surface potential $„ is locally decreased for an n-channel C C D . This can be realised by ion implantation or by locally in creasing the oxide thickness under the gate. Another method is to insert a separate gate with a DC offset voltage with respect to the following gate. Either way, two-phase charge transport is made possible since the barrier prevents the carriers from flowing back. For transport with three or more clock phases, this barrier is created automatically by the extra gates when

(26)

they are at low voltage, and the same scheme of fig. 2.6 is applicable. When clock phase $ A is high, input signals a and 6, represented by full or empty charge packets (electrons), arrive in the structure at gates Gi and G2. Next, they interact at gate G3 when 4>B is high and $A is low. Only

if the charge packets represent two logical ones will the surface potential under gate G3 drop below the surface potential created by the barrier of gate G4, and surplus charge will flow under this gate. Charge is prevented

from flowing back to G! and G2 as the surface potential under these gates

is lower than the barrier potential of G4 when $# is high. The resulting

charge packet under G4 represents the logical AND function of inputs a and

b, while the remaining charge packet is logical ' 1 ' when one or two inputs

are ' 1 ' . This OR-result is shifted to G5 when $A is high. So, with only five

gates, a combined AND/OR circuit can be realised, while for inversion one of the previously described regenerators must be used.

A unique feature of Charge-Coupled Devices is the ease with which threshold logic, also called majority gate logic, can be implemented. Al though it can be realised in such conventional technologies as n-MOS, CMOS and I2L, it will always consist of a relatively large number of transis

tors while in CCD's it can be achieved with only a few gates. An example is given in fig. 2.7, showing the 'more than two out of four' ( > 2 ) function. Four input charge signals a, b, c and d are transferred under the gate G5 that can only contain two full charge packets, fixed by its area. If more than two input signals are logical '1', the surplus will flow over the potential barrier into the well under gate G$. In the case in which all four inputs are ' 1 ' , this would imply the overflow of 2 packets to G6, so that an extra gate

G7 is added under which one packet can be spilled. The > 2 output signal

e is now normalised and can be written as:

e = a-b-c + b-c-d + c-d-a + d-a-b (2-2)

It should be noted that the spilled charge packet ƒ represents the function

abed and can be used as such. Moreover, the charge that remains under

G5 can be normalised in the same way as the signal e (not drawn), after which the output signal from Gg represents the function a + b + c + d.

Although the principles of CCD logic circuits were known in 1972, it was only in 1974 that experimental work in this field was initiated. Very

(27)

1 1 r 1 1 G2 G3 " - 1 G, IJS GJ, - i --i i i i j i "•A GB G6 r — G7 -drain -e>2

Figure 2.7: CCD four-input > 2 majority gate; dashed lines : potential

(28)

extensive research is described in ref. [11], while the concept of the so-called Digital Charge-Coupled Logic (DCCL) is highlighted in refs. [6] and [12]. Most of the results described in these references were obtained with p-channel CCD's, as the p-MOS technology was common in the early sev enties. Later, the devices were redesigned for n-channel technology in order to improve speed. Other experiments with basic logic gates in n-channel technology are treated in ref. [7].

None of the DCCL chips containing multipliers and transform circuits employed a regular mesh of processing elements, which complicated the de sign. Moreover, a very complex clocking scheme was required in all cases. Only in 1982 did Nash recognise that CCD logic is particularly well suited to implementing large, regular, pipelined architectures [13]. He combined Digital Charge-Coupled Logic with conventional n-MOS circuitry and re alised, among other things, a 4-bit ripple adder in which the CCD's provided extremely efficient storage cells and logic functions, while n-MOS transis tors accounted for a fast ripple carry through the full adders. T h e latter operation was not controlled by a system clock and so the number of adders that could be passed within a clock cycle was limited by the propagation delay of the carry.

Multiple-valued logic circuits based on Charge-Coupled Devices have been developed by Kerkhoff [14] [15]. In these circuits, 4 levels of charge storage are distinguished, thus increasing the information density and en abling new logic functions. Experimental results on multiple-valued CCD logic were obtained with n-MOS surface channel CCD's in which t h e speed of operation was limited to about 1 MHz. This brings us to t h e charac teristics and practical limitations of the CCD logic circuits that have been realised so far.

In surface-channel CCD's with gate dimensions around 10 /xm, the max imum speed of operation is limited by the rather low fringing fields, as mentioned in subsection 2.2. If only thermal diffusion were responsible for charge transport, which is certainly the case for gate lengths exceeding 20 /xm, then the time-dependent total amount of charge Q(t) under the supplying gate would, apart from a fast initial drop, be given by:

(29)

in which Q(0) is the total charge at t = 0, D is the carrier diffusion coef ficient and L is the gate length [16].. For an n-channel device with a gate length of 10 /xm, approximately 100 ns would be needed at room tempera ture to have more than 99% of the initial charge transferred. In practical logic designs, the total length may be well in excess of 10 ^im, as charge is transferred into a common well and spilled over a barrier into another well. Self-induced drift is responsible for a faster response for large charge packets, but its contribution can be neglected for times exceeding the time constant in eq. (2.2) and for initially small charge packets ( < 1011 e/cm2 ). Fringing fields, although small compared with buried channel CCD's, will improve the speed of charge transport. The analysis is rather complex and depends on the substrate impurity concentration, the clock voltage swing, the gate length and the size of the gap between the gates. However, an order of magnitude can be gained on the transfer time in practical structures [16]. Note t h a t in n-phase charge transport, a clock cycle includes n transfers.

Digital Charge-Coupled Logic is dynamic logic and the power dissipa tion P is given by:

P = CV*fc (2.4)

in which C is the equivalent capacitive load of the clock, V is the clock volt age swing and fe is the clock frequency. Equation (2.4) has t o be evaluated

for every clock phase separately, since the load generally differs from phase to phase. Although this equation gives the on-chip power dissipation, the total dissipation including the clock is generally much higher due to exter nal wiring capacitances. However, the main goal in large logic systems is to get the computation done within a period of time without heating the chip too much. A figure of merit which expresses this has been used by Nash [13]; it is equivalent to throughput per unit chip area per unit power. Although this figure of merit shows the considerable advantage of CCD logic, it has to be noted that the rather low clock frequency for present CCD logic is masked by its low power dissipation and the extremely low area consumption.

In order to minimise the gap between gates, which increases t h e speed of charge transfer, overlapping electrode structures are often employed (see e.g. [3]).. By this means, the power dissipation is also increased as the gate-gate capacitance adds to the total clock load. This is especially important

(30)

for small gate areas where the relative overlap area is large since the thick ness of the insulation layer between the electrodes is about the same as the oxide thickness (100 nm).

As the CCD clock is constantly running, power dissipation is in gen eral not reduced in the absence of logical input signals, so that as many computations as possible should be carried out per unit of time. For maxi mum efficiency, no CCD cell should be clocked without being used for some kind of logical operation or as local memory. This leads automatically to applications in highly concurrent signal processing where continuous data streams are involved.

A practical drawback of the Digital Charge-Coupled Logic devices re alised so far has been the complicated clocking and the need for several DC voltages for either sourcing or sinking charge packets, precharging floating gate amplifiers or controlling of overflow barriers. This not only compli cates the driving of the circuits, but also complicates the interconnect task of the chip. The 2-input adder, described in ref. [6] for example, used eight clock phases, three control clocks and three DC voltages.

The limitations of surface channel Charge-Coupled Logic Circuits have stimulated the research into logic gates based on Junction Charge-Coupled Devices. As these JCCD's offer extra possibilities for vertical charge trans port, the research aim has been to exploit these possibilities rather than to implement DCCL functions in buried channel CCD's. This way, the substrate PNP transistor which is present in Junction Charge-Coupled De vices, has been the key element, to which most attention has been paid.

2.5 Bit-level Systolic Arrays

Systolic arrays were introduced in 1978 by H.T. Kung and C.E. Leiser-son as a solution to the limited performance and the timing problems of conventional chip designs [17] [18]. The main principles of systolic arrays, however, were known before and are well treated in ref. [19]. The name 'systolic array' is derived from physiology, where a 'systole' refers to the contraction of the heart which causes blood to be pumped through the veins. The single processing element (PE), which is the building block of a

(31)

memory

Figure 2.8: Conventional computer architecture with a single processing

element (PE).

systolic array, is the analogon of the heart, with data instead of blood be ing regularly pumped in and out of the processor to and from neighbouring processors in the array.

The systolic array combines several techniques to improve the per formance of digital circuits for computation-bound problems, including pipelining and parallel processing. In computation-bound problems, the number of input and output (I/O) elements is smaller than the total num ber of operations. In the reverse case, the problem is called I/O bound and the performance has to be improved by speeding up the input and output circuitry. Compute-bound problems are typically those arising in signal processing applications.

To examplify the systolic concept, we consider the multiplication of two n-bit words. A conventional way to do this is to use a single processor that repetitively carries out the shift-and-add algorithm on data items from memory, as depicted in fig. 2.8.

The number of I/O operations is on the order of the word length n, and so the computation of the product takes on the order of n cycles of the system clock that controls the data flow. A more efficient and dedicated approach is depicted in fig. 2.9.

The structure basically consists of a mesh of horizontal and vertical lines, corresponding to the bits of the multiplier o and multiplicant b, re spectively. On the crossings, the partial products of bits a,6y are calculated and fed into a Full Adder. The other inputs of the Full Adders are the sum and carry signals of equal significance from neighbouring Full Adders. The product output bits are indicated by Pt.

(32)

A A A

A

A\ 'A

■ c ssc -

"At

A A. A'

Al

i i i i i i I I I I i I I I I i i i i

Al

r2n+1 r2n

(33)

-p.e. — — n-times

CH>

out p.e. latch

™ 5 ^

m-times

- p - O

-out

Figure 2.10: Pipelining of multiple operations by inserting latches in be tween processing elements.

In this design, the number of I / O operations is limited to the fetch of the two input words and the storage of the result. Since it employs the parallel processing of multiple bits, it can operate much faster than the previously described concept. However, the higher the significance of the o u t p u t bit Pi, the more processing time is needed because of the increasing number of serially connected Full Adders t h a t has to be passed for its calculation. The total input to o u t p u t delay equals (2n — l)tc, in which te is the carry

delay of a single Full Adder.

A p a r t from techniques t h a t reduce the delay of the carry signal, a more general approach can be followed in order to improve the operation of this parallel multiplier, namely one which is known as pipelining. In the design of fig. 2.9 it can be observed that less significant Full Adders are inactive during the propagation of the carry to the most significant o u t p u t . This characteristic is depicted in fig. 2.10, for a linear array of n processing elements without global interconnections.

If the input to o u t p u t delay of a single processing element is t0, the

overall delay will be ntc. This chain, however, can be split u p into m sec

tions by inserting memory cells (latches) in between processing elements, as is also shown in fig. 2.10. These memory cells guarantee valid input d a t a for subsequent sections, so that a number of m processes may be executed concurrently in the array. This is called pipelining. Although the total input to o u t p u t delay (lag) is increased by m times the set-up time of a single memory cell, tm, the amount of o u t p u t d a t a per second has also been

(34)

increased by roughly a factor of m for continuous data flow. The number of outputted databits (words) per second is usually referred to as through put. The system is now synchronous, i.e. controlled by a clock $ that shifts information through the memory cells. The operations performed by the processing elements in a pipelined system are not necessarily identical; nei ther are the individual delays as long as the clock $ matches the maximum delay. However, an optimum may be selected between throughput and total I/O delay through the distribution of latches.

Returning to the multiplier of fig. 2.9, this design can be pipelined by inserting memory cells at nodes that are synchronous, i.e. the data at these nodes is valid after the same minimal delay. Since the input signals are distributed globally through the mesh of bit lines, memory cells have to be inserted in these lines as well in order to apply the correct data at the nodes.

The systolic multiplier can be regarded as the ultimate level of pipelin ing and parallelism applied to the structure of fig. 2.9. Moreover, a pure systolic architecture consists of identical cells and has only local data inter connections. The result is shown in fig. 2.11 for a four-bit systolic multiplier [20].

The key element in this multiplier is the so-called inner-product step processor, represented by the circles in the grid. This processor performs the following functions :

• S «- S' + {a • b) + C'

• C *-{a- b)S' + (a • b)C' + S'-C' • a *— a

• b^b

The incoming signals a and b are transferred to the respective outputs after a single delay, which is indicated by the squares, while the outgoing sum S and carry C obey the same rules as in the gated Full Adder in the circuit of fig. 2.9. In fact, only a number of latches are added; as a consequence, the input words are now applied in a time-skewed manner. First the partial product a060 is calculated in the upper left processor ( the

(35)

(36)

input sum S' and input carry C' are zero ), and the result is transferred vertically down in the output queue. T h e output carry C is transmitted one row to the right, as the significance is one bit higher. In the next interval, the products aobi and ai60 are calculated in the subsequent cells and so on.

The latency of this four-bit multiplier is 11 cycles as 11 latches are involved from input to output, but the computation of the same n u m b e r of successive products is carried out concurrently. Although some of t h e carry and sum input signals of the upper cells are fixed to '0', the array still consists of identical inner-product step processors. Therefore, most effort is put into the design of a single cell.

The most striking feature of fig. 2.11 is the regularity and modularity of the design. The structure can- be extended by simply adding diagonals of cells to the right and inserting rows of cells preceded by skewing latches before the deskewing latches at the bottom of the array. Another way to expand the structure is to combine several 4 x 4 mutipliers in a larger ar ray with a word length which is a multiple of 4 bits. This, however, will lead to an unnecessary increase in I / O latency because of the skewing a n d deskewing of signals between individual 4 x 4 matrices. As communication is limited to six neighbouring cells in the grid, this type of systolic array is known as a hexagonally connected systolic array [17]. Several other types of systolic arrays have been proposed, such as the linearly connected, t h e orthogonally connected and the spiral systolic array [21] [22]. These archi tectures arise from such typical signal processing functions as convolution and orthogonal transformation, for which systolic algorithms have been developed [23] [24].

T h e structure of fig. 2.11 is a so-called bit-level systolic array, as only one bit at a time is handled in a single processing element [25] [26]. In o t h e r systolic arrays, the inter-cell connections are multibit lines and the cells themselves operate on words. Bit-level systolic arrays have been demon-stated in practice by Evans et al. [27].

A few remarks should be made at this point. Although, in general, com munication in systolic arrays is limited to topological next neighbours, this may not be equivalent to geometrical next neighbour communication in t h e actual chip layout. A clear structural example is the spiral systolic architec ture proposed by S.Y. Kung [22] for the LU-decomposition of matrices: In this array, communication paths exist between cells in the top row and cells

(37)

in the b o t t o m row of a large array, but these cells are still topological next neighbours, as can be seen by drawing the array on a cylinder. Such long connections, however, slow down the operation of a practical realisation, as they do in another problem that may be encountered in practice : clock skew. Systolic architectures are controlled by a global clock, whose signal is broadcast over the chip through long lines. These lines often exhibit a substantial series resistance and this, together with the capacitive load introduced by the large number of memory cells, will lead t o significant delays. However, the importance of this problem depends strongly on the technology.

Apart from the skewing and deskewing latches, a large number of latches are needed in the systolic array processors themselves, viz. four in each cell and one in between cells in the multiplier of fig. 2.11. This is a general characteristic of systolic arrays, and in practical realisations with static latches, the memory cells can take up to 70% of the total chip area [28]. Therefore, it is essential that systolic arrays be realised in technologies with inherently small memory cells.

In conventional circuitry (n-MOS, C M O S , I2L), memory cells are con structed with multiple transistors, with the dynamic MOS cell being the smallest. In this cell, t h e logical state is preserved for some time by the pres ence or absence of charge on a capacitor, b u t the transfer of the logic state to subsequent sections is not equivalent to the transfer of this charge. Some kind of operation, usually inversion, is always involved. As the logic state is represented by charge packets themselves in Charge-Coupled Devices, they offer extremely small memory cells. Moreover, every cell is synchronised and consumes only a little power. They are therefore attractive for use in systolic arrays, provided they can at least be combined with logic circuits [13] or, which is more favourable, the memory cells can perform efficient logic functions themselves.

2.6 Conclusions

Charge-Coupled Devices have been introduced and their basic principles and characteristics have been discussed. In digital applications of CCD's

(38)

only a few discrete levels of transported charge are distinguished, namely two levels in binary logic and four levels in the multiple-valued circuits realised so far.

A CCD elementary cell can be regarded as a dynamic memory cell with a low power consumption and small area. High density digital CCD mem ories have been reported in literature. Experimental digital CCD circuits have been realised with both n- and p-type surface channel Charge-Coupled Devices, this has the advantage of relatively simple processing but the draw back of a limited operating speed.

The concept of the systolic array as a solution to the limited through put of conventional digital chip architectures has been discussed. It is especially suited to processing continuous' data streams, as it employs a high level of pipelining. The design task is simplified as the systolic array basically consists of a large number of identical cells with local memory. This particularly holds for bit-level systolic arrays.

The synchronous data transport in the array is fully comparable with the transport of charge packets in Charge-Coupled Devices, and the need for low power, high density memory cells in systolic arrays can be excellently fulfilled by Charge-Coupled Devices. In such arrays, the logic functions can be performed by compatible logic circuits (n-MOS) or by the CCD's themselves. Although highly pipelined, parallel digital circuits have been realised with CCD's, an all-CCD systolic chip has yet to be reported.

(39)

References

[I] C.H. Sequin and M.F. Tompsett, 'Charge Transfer Devices', Academic Press, New York, 1975

[2] G.S. Hobson, 'Charge Transfer Devices', Edward Arnold, London, 1978 [3] M.J. Howes and D.V. Morgan, eds., 'Charge-Coupled Devices and Sys

tems', Wiley, New York, 1979

[4] W.E. Engeler, J.J. Tiemann and R.D. Beartsch, 'A Memory System Based on Surface-Charge Transport', IEEE Journal of Solid-State Cir cuits, vol. SC-6, no. 5, pp. 306-313, Oct. 1971

[5] M.F. Tompsett, 'A Simple Charge Regenerator for Use With Charge-Transfer Devices and the Design of Functional Logic Arrays', IEEE Journal of Solid-State Circuits, vol. SC-7, no. 3, pp. 237-242, June 1972

[6] R. J. Handy, 'Use of CCD in the Development of Digital Logic', IEEE Trans, on Electron Devices, vol. ED-24, no. 8, pp. 1049-1061, Aug. 1977

[7] J.H. Montgomery and H.S. Gamble, 'Basic CCD Logic Gates', The Radio and Electronic Engineer, vol. 50, no. 5, pp. 258-268, May 1980 [8] M.J. Pelgrom et al., 'A Digital Field Memory for Television Receivers', IEEE Trans, on Consumer Electronics, vol. CE-29, no. 3, pp. 242-250, Aug. 1983

[9] M. Yamada, K. Fujishima, K. Nagasawa and Y. Gamou, 'A New Multi level Storage Structure for High Density CCD Memory', IEEE Journal of Solid-State Circuits, vol. SC-13, no. 5, pp. 688-693, Oct. 1978 [10] T.D. Mok and C.A.T. Salama, 'Logic Array Using Charge-Transfer

Devices', Electronics Letters, vol. 8, no. 20, pp. 495-496, Oct. 1972 [II] R.A. Allen et al., 'Charge-Coupled Devices in Signal Processing Sys

tems', vol. V, Final report, Navy contract no. N 00014-74-C0068; Dec. 1979

(40)

[12] T.A. Zimmerman, R.A. Allen and R.W. Jacobs, 'Digital Charge-Coupled Logic (DCCL)', IEEE Journal of Solid-State Circuits, vol. SC-12, no. 5, p p . 473-485, Oct. 1977

[13] J . G . Nash, 'Combinatorial Digital Logic Using Charge-Coupled De vices', I E E E Journal of Solid-State Circuits, vol. SC-17, n o . 5, p p . 957-963, Oct. 1982

[14] H.G. Kerkhoff and M.L. Tervoert, 'Multiple-Valued Logic Charge-Coupled Devices', IEEE Trans, on Computers, vol. C-30, n o . 9, p p . 644-652, Sept. 1981

[15] H.G. Kerkhoff, M.L. Tervoert and H.A.C. Tilmans, 'Design Consid erations and Measurement Results of Multiple-Valued Logic C C D ' s ' , Proc. of the 11th Int. Symp. on Multiple-Valued Logic, Oklahoma City, OK, U.S.A., p p . 205-211, May 1981

[16] J . E . Carnes, W.F. Kosonocky and E.G. Ramberg, 'Free Charge Trans fer in Charge-Coupled Devices', IEEE Trans, on Electron Devices, vol. ED-19, no. 6, pp. 798-808, June 1972

[17] H.T. Kung and C.R. Leiserson, 'Algorithms for VLSI Processor Arrays' in : 'Introduction to VLSI Systems', C.A. Mead a n d L.A. Conway, Addison-Wesley, 1980

[18] H.T. Kung, 'Why Systolic Architectures ?', I E E E Computer, vol. 15, no. 1, pp. 37-46, Jan. 1982

[19] J.R. J u m p and S.R. Ahuja, 'Effective Pipelining of Digital Systems', IEEE Trans, on Computers, vol. C-27, no. 9, p p . 855-865, Sept. 1978 [20] J. Hoekstra, 'Systolic Multiplier', Electronics Letters, vol. 20, no. 24,

p p . 995-996, Nov. 1984

[21] H.T. Kung, 'Highly Parallel Systolic Algorithms and Their Implemen tation', Proc. 5th Aachen Symp. on Mathematical Methods in Signal Processing, Aachen, West Germany, Sept. 1984

[22] S.Y. Kung, 'VLSI Array Processors', IEEE ASSP Magazine, vol.2, no. 3, pp. 4-22, July 1985

(41)

[23] M.M. McCabe, A.P.H. McCabe, B. Arambepola, I.N. Robinson and A.G. Corry, 'New Algorithms and Architectures for VLSI', GEC Jour nal of Science and Technology, vol. 48, no. 2, pp. 68-75, 1982

[24] T . Willey, R. C h a p m a n , H. Yoho, T.S. Durrani and D. Preis, 'Systolic Implementations for Deconvolution, D F T and F F T ' , Proceedings of the IEE, vol. 132, P t . F , no. 6, pp. 466-472, October 1985

[25] J.V. McCanny and J . G . McWhirter, 'Implementation of Signal Pro cessing Functions Using 1-bit Systolic Arrays', Electronic Letters, vol.

18, pp. 241- , 1982

[26] J.V. McCanny, D. Phil and J.G. McWhirter, 'Completely Iterative, Pipelined Multiplier Array suitable for VLSI', Proceedings of the IEE, vol. 129, P t . G, no. 2, p p . 40-46, April 1982

[27] R.A. Evans, D. Wood, K. Wood, J.V. McCanny, J.G. McWhirter and A.P.H. McCabe, 'A CMOS Implementation of a Systolic Multi-Bit Convolver Chip', in: VLSI '83, North Holland, Amsterdam, pp. 227-235, Aug. 1983

[28] J. Hoekstra, 'Junction Charge-Coupled Devices for Bit Level Systolic Arrays', International Workshop on Systolic Arrays, Technical Digest, Oxford, GB, July 1986

(42)

(43)

3 Lateral charge transport and

vertical charge injection in

Junction Charge-Coupled Devices

3.1 Introduction

In Junction Charge-Coupled Devices, the transport of charge packets in the channel is controlled by voltages across reverse-biased pn-junctions. This opens up t h e way t o vertical charge transport through the steering gates, either by forward biasing these junctions, which is discussed in the next chapter, or by injecting minority carriers across the reverse biased junctions into t h e J C C D channel. This operation is performed by the injector gate, a vertical N P N transistor in which the J C C D channel acts as a floating collector. T h e injector g a t e can b e easily incorporated in the J C C D as it is realised in a bipolar technology.

In this chapter, the operation of the Junction CCD will be described first along with some i m p o r t a n t physical parameters such as channel potential and punch-through voltage. Although J C C D ' s can be realised with very simple processing [l], a number of processing steps have to be added if high quality devices are required [2]. For the present research, two fabrication processes have been employed. One was derived from the THD-01 bipolar process and the other was derived from the industrial standard process N032. The processing steps and their impact on device performance will be discussed and experimental results will be presented.

Injector gates have been applied first in parallel-in serial-out filters [3], resulting in very regular layouts t h a t can easily be adapted for other filter functions. There are, however, a few fundamental limitations on the use of the injector. The m a x i m u m value of the injector current is determined by the m a x i m u m amount of charge t h a t can be stored in the J C C D , while the current itself determines the frequency response of the injector. For low values of the injector current, the transit frequency of the injector will be far below the J C C D operating frequency, thus limiting the filter frequency. These limitations are not unique for filter applications but are valid for the

(44)

general use of injector gates.

Another interesting application of injectors is found in the transfer of charge from largely capacitive charge sources into a C C D , as is encoun tered in the readout of silicon strip detectors. Such detectors, consisting of a large number of strip shaped diodes, are used in high energy physics experiments where they provide information on the traces of elementary particles travelling through the detectors. When a diode is hit by a parti cle, a charge signal results and by stacking several detectors, t h e particle traces can be reconstructed. Strip detectors can handle high flux densities since the output signal does not last more than a few tens of nanoseconds. Because of the large number of strips on a single detector (> 100), some multiplex operation is necessary; such as performed by CCD particle de tectors employed in the same field. The multiplexer has to be preferably located close to the detector to keep the wiring capacitance low b u t even in this case, the detector capacitance is too high to allow all signal charges to be transferred quickly into a conventional CCD input without additional electronic amplifiers. The injector gate can overcome this problem because of the high value of the transconductance, compared with field effect de vices. The results of initial experiments with silicon strip detectors and injector gate readout will be presented.

3.2 Physics and fabrication technology of Junction

Charge-Coupled Devices

Junction Charge-Coupled Devices are buried channel CCD's (BCCD's), which means t h a t the charge packets are transported in t h e silicon bulk at some distance away from the steering gates. The charge packet consists of majority carriers (electrons) and the packets are separated by fully depleted regions. A schematic cross-section of a Junction CCD is given in fig. 3.1.

T h e J C C D is made up of an n-type epilayer on a p-type substrate, in which a number of p-gates are placed. The n-epilayer is electrically con tacted through the n+ source and drain regions. By applying a large enough positive voltage to the n+ contacts with respect to gates a n d substrate, the epilayer can be fully depleted. The depletion layer edges, extending from

(45)

p-type gates

V J K J \ Jij K J ^ J V J

n'drain depletion layer edges" 1__

-tf

II II ii n-epilayer p-substrate

Figure 3.1: Cross-section of a Junction CCD; dashed lines : depletion layer edges.

the substrate and the gate junction, have been denoted by dashed lines in fig. 3.1. Going along the line y — y', an electrostatic potential m a x i m u m is reached at the point where the depletion layers touch. This potential maximum is called the J C C D channel potential $0 if both the substrate and t h e gates are kept at ground potential. In all other cases, this potential maximum will be denoted by $c h. The channel potential $o cannot directly be measured because of .the built-in potential differences t h a t result from different impurity concentrations. In practice the pinch-off voltage V0 is

measured, which is the voltage between the n+ regions and the gates a n d substrate, for which the conductance between the n+ regions is just zero. This forms the basis of a potential well measurement technique [4].

As in MOS CCD's, a local potential well for electrons can be created by applying a positive voltage step to one of the gates. This well can be filled with electrons as long as the electrostatic potential does not drop below the potential under adjacent gates, which is usually

$o-1.2.1 F a b r i c a t i o n p r o c e s s # 1

Two fabrication processes have been employed for the realisation of Junc tion CCD's and J C C D circuits. Process # 1 requires only five masking steps and relatively simple processing, but the resulting J C C D ' s have rather large dimensions and poor characteristics. However, this process has been very

(46)

TABLE 3.1 process step p-substrate MA-441 only n-epitaxial layer MA-441 only p-isolation p-base diffusion (gate) (OP) (SP) n-emitter diffusion (source/drain) contact holes MA-441 only interconnection (SN) (CO) (IC) details r e s i s t i v i t y r e s i s t i v i t y r e s i s t i v i t y thickness thickness sheet res. sheet res. depth sheet res. depth min. dim. min. dim. min. width 50..80 ficm 20..30 ton 5 . . 8 ftcm 7 um 8.5 um 9.5..10.5 Si/a 180..200 a/a 2.5 urn 5.5..6.5 n/n 1.9 um 10x10 ym2 6x6 um 10 ym

helpful in the verification of principles of vertical charge flow and the logic circuits. The main processing steps are listed in table 3.1.

Most processing steps were taken over from the THD-01 standard bipo lar process. For the MA-441 chip, some extra modifications were made to the process because of the availability of 3-inch substrates. The deviating parameters for the epilayer and substrate have been added in the table.

The JCCD structure has been analysed with the aid of a two-dimensional Poisson solver [2] [5]. For this process, a Gaussian impurity profile for the gate and an abrupt substrate junction have been assumed in the potential calculations. These assumptions are acceptable since rather large varia tions in process parameters, such as epilayer concentration and epilayer thickness, can be expected and optimisation of the JCCD potential profile is not possible in this process. A typical two-dimensional potential distribu tion is shown in fig. 3.2, for a three-gate JCCD cell; cf. figure 3.1. This plot was calculated using the depletion approximation, and the gate dimensions

(47)

' y 5[im' ' x i O f i m '

Figure 3.2: Equipotential lines in a JCCD fabricated in process # 1 . Gates

A and B are kept at ground potential. Gate C is at 7 Volts.

in the direction perpendicular to the plane of drawing have been assumed infinite [5], [6]. Gate A appears partly on the upper left and the upper right of the plot (as gate D) because of the periodicity of the structure. This gate, gate B and the substrate are held at ground potential while the epilayer is fully depleted. Seven Volts were applied to gate C, which is slightly more than the JCCD channel potential. This situation is common in logic circuits, as will be described in the following chapters.

A large parasitic potential well can be observed between gates A and B. This well will not completely disappear when one gate is clocked to 7 Volts, as can be seen from the equipotential lines around gate C. As a consequence, a large transfer inefficiency can be expected.

An important parameter is the value of the potential maximum at the silicon-silicon dioxide interface between two gates, which is called the JCCD surface potential $ , . The notation $ , will from now on uniquely be used for this parameter. If no gate voltages are applied, then the surface poten tial is denoted by $i 0- The value of $,o determines the maximum allowable

voltage difference between adjacent gates and, consequently, the clock volt age swing. In fig. 3.2, the potential $s 0 is reached exactly halfway between

gates A and B since these gates are both at ground potential. When a positive voltage is applied to one of the gates, the potential maximum $ , increases and its position is shifted towards the positive gate, as can be

(48)

■ L , |

P' n ' P'

Figure 3.3: One-dimensional punch-through diode (a), electric Reld E and potential distribution <& for fully depleted n-layer and zero gate voltages

(b) and electric field and potential distribution at punch-through (c). seen between gates B and C in fig. 3.2. This situation is illustrated in fig. 3.3 for a one-dimensional p+n p+ structure, sometimes referred to as a reach-through or punch-through diode [7].

Figure 3.3.b shows the behaviour of the electric field and the electro static potential along the surface between gates A and B in fig. 3.2. In fig. 3.3.c, the maximum allowable voltage is applied to the leftmost p+ region and only the zero bias electric field remains at the junction, the potential difference being equal to the built-in voltage Vbi (reach-through). For higher

voltages, this junction is forward biased and injection of holes is initiated. If the depletion layer widths in the heavily doped p-regions are neglected, then, with the aid of the depletion approximation :

(49)

*a = q-^W> (3.1)

in which ND is the impurity concentration of the n-layer and W the distance

between the metallurgical junctions. With the aid of fig. 3.3.c, the terminal voltage difference Vpt between the p-regions at which punch-through occurs

can be found :

V

pt

=

qN

-°W> - wJWM (3.2)

A^ai V ^a»

which is equal to :

Vpt = 4 ( $s 0 - yfcjvü) (3.3)

Since the gate junction d e p t h is less than the net gap width in the present J C C D ' s , a one-dimensional description is highly insufficient and additional analysis with computer programs is necessary. Nevertheless, the strong dependence of Vpt on W a n d No is clear.

A large variation in punch-through voltage can be expected for the J C C D realised in process # 1 since the epilayer concentration cannot be controlled tightly. On the other h a n d , a wider gap would lead to deeper parasitic potential wells a n d , therefore, an optimum has to be selected. For the J C C D ' s in process # 1 a gate-to-gate distance of 7.5 fj.ni was designed and carefully controlled during processing, resulting in a net gap size in excess of 4 iiva. The corresponding punch-through voltage is then above 6 V.

3.2.2 F a b r i c a t i o n p r o c e s s # 2

Wolsheimer describes a J C C D process in which the parasitic potential wells between the gates can be eliminated, yielding Junction CCD's with small dimensions and low transfer losses [2] [8]. The same techniques have been applied in process # 2 (N564) which is derived from the industrial standard bipolar process N032. T h e main processing steps have been listed in ta ble 3.2. The steps for the buried layer (BN) and the collector wall (DN) have been included in the process to enable the fabrication of high quality NPN transistors. The phosphorus channel implantation (NI) is performed

(50)

TABLE 3.2 process step p-substrate n-buried layer n-epitaxial layer p-isolation n-collector wall (BN) (DP) (DN) deta r e s i s t i v i t y sheet res. r e s i s t i v i t y thickness sheet res. sheet res. phosphorus surface implant, dose

annealing step phosphorus channe annealing step p-base diffusion (gate) emitter push implant, dose (SP) n-emitter diffusion (source/drain) contact holes 1st layer interconnection 1st layer anodisation contact holes 2nd layer interconnection 2nd layer (SN) (CO) (IN) (C02) (IN) scratch protection (CB) sheet res. depth sheet res. depth min. dim. min. width min spacing min. dim. min. width min. spacing i l s 50..80 ficm 13..19 a/a after 8 ftcm 7 ym 3..10 a/a ' 4..7 a/a 3 « 1 0n cm"2, 30 1200 °C, 60 min. 8>1012 cm"2, 30 1135 °C, 85 min. 195..235 a/a 0.95 ym 0,4 ym 10..15 a/a 1 ym 4x4 ym2 3 ym 3 ym 5x5 ym2 12 ym 8 ym epi keV keV

(51)

|N| (cm

y(p.m

Figure 3.4: Impurity profile through a p-type gate for a JCCD realised in process #2 (N564).

through the mask for the gates (SP), thereby locally increasing the impurity concentration in the epilayer under the gates.

The resulting impurity profile through a p-type gate, which was calcu lated with the technological simulation program S U P R E M II [9], is shown in fig. 3.4.

The impurity concentration between the gates is enhanced by the sur face implantation. According to (3.2), this will increase t h e punch-through voltage and so, for a given Vpt, the gate-to-gate distance may be decreased.

This has a beneficial influence on the potential profile in the J C C D channel. W i t h the phosphorus channel implantation, the potential m a x i m u m $0 un der the gates can be increased in such a way t h a t it matches the potential m a x i m u m in the channel between the gates. In process # 2 , the originally designed gap width was 3.8 (im.

(52)

Figure 3.5: Equipotential lines in a JCCD fabricated in process #2. appearing in the upper left and the upper right of the plot is shown in fig. 3.5. It was calculated with the SEMMY2 program [10], for which the impurity profile of fig. 3.4 served as an input. In SEMMY2, Maxwell-Boltzmann statistics are used instead of the depletion approximation and terminal voltages are converted into electrostatic potentials.

The parasitic potential well has completely disappeared in fig. 3.5, as a result of the phosphorus channel implantation and the decreased gap width. In this J C C D , the channel potential is about 10.5 Volts while the surface potential is slightly above 6.5 Volts. For this structure, a punch-through voltage of about 13 V can be expected ( 2 $s) . Note t h a t the net gap width t h a t results from the initial 3.8 ^ m on the SP mask is about 2.4 //m.

The technological simulations and potential calculations have been proven invaluable in the design of JCCD's. However, additional experiments were necessary because of the possible discrepancy between simulated and fab ricated impurity profiles. The simulation programs can then predict the behaviour for slightly modified process parameters. For most logic ap plications, the gap width was increased to 4.1 ;um by over-exposing the

Vertical Chargé Transport 1 in Junction Charge-Coupled Devices

1