The Delft Ising system processor: Design, construction, and operation of a dedicated processor for Monte Carlo experiments on Ising spin systems

(1)

Design, construction, and operation of

a dedicated processor for Monte Carlo

experiments on Ising spin systems

A. HOOGLAND

TRdiss ^

1588

(2)

(3)

The Delft Ising System Processor

Design, construction, and operation

of a dedicated processor

for Monte Carlo experiments

(4)

(5)

ISING SYSTEM PROCESSOR

PROEFSCHRIFT

ter verkrijging van de graad van doctor aan de Technische Universiteit Delft, op gezag van de Rector Magnificus, prof.dr. J.M. Dirken, in het openbaar te verdedigen ten overstaan van een commissie, door het College van Dekanen daartoe aangewezen, op donderdag 26 november 1987 te 14.00 uur

door ARNE HOOGLAND geboren te Amsterdam natuurkundig ingenieur

TR diss]

1588

(6)

Dit proefschrift is goedgekeurd door de promotor prof.ir. B.P.Th. Veltman

Dr. A. Compagner heeft als begeleider, door het College van Dekanen als zodanig aangewezen, in hoge mate bijgedragen aan het tot stand komen van dit proefschrift.

Copyright » 1987, by Delft University of Technology, Delft, The Netherlands.

All rights reserved. No part of this publication may be reproduced, stored in a retrieval system or transmitted in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, without the prior written permission of the author, A. Hoogland, Delft University of Technology, Dept. of Applied Physics, P.O.Box 5046, 2600 GA Delft, The Netherlands.

CIP-DATA KONINKLIJKE BIBLIOTHEEK, DEN HAAG Hoogland, Arne

The Delft Ising System Processor : design, construction, and operation of a dedicated processor for Monte Carlo experiments on Ising spin systems / Arne Hoogland.

-[S.l. : s.n.] (Meppel : Krips repro). -111. Thesis Delft. - With ref. - With summary in Dutch ISBN 90-9001753-4

SISO 521 UDC 681.3.02 : 519.24 (043.3)

Subject headings: Special-purpose computer; Monte Carlo method; Ising lattice.

Front cover illustration: Configuration obtained by the DISP of a square Ising spin lattice with three-spin interactions in both X and Y direction.

Back cover illustration: Celine Leopold

Figures: H.W.J. Blote and A. Hoogland Photography: A.R. Suiters

(7)

1. "Finite-size scaling" resultaten voor de kritieke exponenten van modellen in de 4-state Potts universaliteitsklasse zijn onbetrouwbaar gebleken indien uitsluitend kleine systemen worden gehanteerd.

F. Iglói, D.V. Kapor, M. Skrinjar, andJ. Sólyom, The critical behavior of a quantum spin problem with three-spin coupling, J. Phys. A16 (1983) 4067. 2. Het is onmogelijk een eindige rij van aselecte getallen te produceren die aan de

voorwaarden voor complete chaos voldoet.

A. Compagner and A. Hoogland, Maximum-Length Sequences, Cellular Automata, and Random Numbers,

J. Comp. Phys. 71 (1987) 391.

3. De kwaliteit van generatoren van aselecte getallen is een onderwerp dat te weinig aandacht krijgt in de monte carlo literatuur. De noodzaak om de kwaliteit van deze generatoren te bepalen met toetsen die nauw verwant zijn met het probleem waarvoor aselecte getallen benodigd zijn, verdient een grotere bekendheid.

M.N. Barber, R.B. Pearson, D. Toussaint, and J.L. Richardson, Univ. Calif, preprint, #NSF-ITP-83-144 (1983).

4. Monte carlo berekeningen met een korte lengte produceren niet alleen relatief grote statistische fouten, maar kunnen als gevolg van grote correlatieafstanden bovendien leiden tot onderschatting van de statistische fouten.

5. De lage kosten, hoge rekensnelheid en gemonopoliseerde beschikbaarheid voor a priori begrensde probleemgebieden maken special-purpose computers bij uitstek geschikt voor rekenintensieve simulaties van zeer lange duur. Een brede toepasbaarheid van deze apparatuur is derhalve van minder belang.

6. Door toepassing van snellere componenten met een aanzienlijk hogere mate van miniaturisering zal de volgende generatie DISP (Delft Ising System processor) in de orde van honderd maal sneller kunnen zijn dan de huidige.

(8)

7. Wijzigingen en uitbreidingen van special-purpose computers, uit te voeren in nauwe samenwerking van gebruikers met hard- en softwarespecialisten, zijn vooral levensvatbaar indien van meet af aan een hoge mate van modulariteit van de hardware wordt nagestreefd.

R.B. Pearson, J.L. Richardson, and D. Toussaint, A special purpose machine for Monte Carlo simulation,

J. Comp. Phys. 51 (1983) 241.

8. De term "cottage industry", soms gebruikt als aanduiding van de aktiviteiten rond de ontwikkeling van special-purpose computers, geeft uitsluitend een indruk omtrent de schaalgrootte van deze aktiviteiten.

9. Het exclusief gebruik van het woord meten ten behoeve van waarnemingen in de fysische werkelijkheid getuigt van weinig respect voor de kwantificering van observaties in modellen.

10. De doorberekening van een deel van de kosten van het gebruik van centrale universitaire computerfaciliteiten ten laste van het verbruiksgoederenkrediet van de gebruikers stimuleert in sterke mate de aankoop van personal computers en holt daarmee het voortbestaan van deze centrale faciliteiten uit.

11. Het plaatsen van advertenties voor het werven van AIO's is wellicht overbodig indien de kosten hiervan aan het salaris van betrokkenen wordt toegevoegd.

12. Het streven van sommige leerkrachten in het basisonderwijs om de spelling van de Nederlandse taal drastisch te vereenvoudigen getuigt eerder van eigen tekortkomingen dan van een tegemoetkoming naar de kinderen.

13. Een antroposofisch leefgedrag demonstreren door het opzichtig afzweren van technologische verworvenheden, getuigt niet van een goed begrip omtrent de opvattingen van Steiner.

14. De folieverpakking die gemeenlijk gehanteerd wordt voor het opbergen van bederfelijke waar in de ijskast, zal de ontvanger van de HOOP-nota tot nadenken stemmen.

(9)

(10)

"Any method goes"

Paul Feyerabend

(11)

Special-purpose computers are a new technique in computational physics to calculate properties of many-particle systems. At a moderate cost, computational devices can be built that compete with general-purpose computers in speed, while being available 24 hours a day.

Because of the advances in the field of micro-electronics, the first attempts in Delft to make small special-purpose processors (using early SSI-RTL logic) were soon followed by the construction of rather powerful processors for crystal-growth simulation, for molecular dynamics, and for Monte Carlo calculations on large Ising or Ising-like systems. A fruitful co-operation of theoreticians and experimentalists made a successful operation of these processors possible. A Navier-Stokes solver and a new very fast Ising system processor are expected to be operational in 1988.

Although numerical problems in physics have recently profited from the availability of mini-supercomputers, a growing number of institutions are active in the development of dedicated hardware devices. For certain categories of problems, the favorable price-performance ratio makes these special-purpose machines attractive. The necessary effort to construct the hardware and to develop the accompanying software should however not be underestimated.

(12)

1. INTRODUCTION

1.1. What is computational physics?

In the last two or three decades, the increasing availability and adaptability of faster computers with larger memories has influenced many human activities. In physics, almost all fields have been affected, so much so that the term "computational physics" tends to loose any specific meaning. Here, the term will be used to indicate the study of physical systems with a large number of degrees of freedom by means of numerical methods; in particular, the numerical simulation of these systems will be implied. Typical examples are the many-particle systems of statistical mechanics such as fluids or spin systems, using the molecular-dynamics method or the Monte Carlo calculations.

The merits of computational physics lie in its ability to extend theoretical physics beyond the limitations of analytic methods and at the same time in providing experimental results unobtainable by conventional empirical techniques. For instance, the magnetization of a three-dimensional Ising spin system can be quite easily determined in a Monte Carlo calculation, but exact theoretical results do not exist and true measurements for such an idealized system are not straightforward. It is not difficult to "measure" the pressure of a gas of hard spheres in a molecular-dynamics calculation, but an analytic expression for this quantity is hard to obtain and true measurements for this simplified model of a real gas are impossible. These examples show that computational physics can provide valuable stepping stones between theory and experiment.

Computational physics has of course its own limitations. The results obtained are usually not precise but show a certain amount of statistical scatter, not unlike the results of true experiments. The length of simulation runs is by necessity restricted. Also, the systems that can be studied successfully are rather small in comparison with macroscopic systems. In addition, many problems in large-scale computations suffer from slow convergence. To increase the statistical accuracy of results obtained by numerical simulations by a factor of two, usually an increase in computing time is needed by a factor of four. Similarly, when the size of a matrix is increased by a factor of two, the time needed for a numerical diagonalization goes up by a factor of about eight.

These circumstances have often forced computational physicists to search out the most powerful computing facilities available. The full use of a supercomputer would indeed often enable one to obtain sufficiently accurate results, but because of limited funding supercomputer time must usually be shared with many other users. However, numerical simulations in statistical mechanics have in common that the bulk of the

(16)

-12-computational effort is concentrated in simple elementary operations that have to be repeated a great many times. This situation is ideally suited for the use of special-purpose computers, which can handle a limited class of problems efficiently, and therefore can provide supercomputer power at a low price. This thesis describes the construction and functioning of such a machine.

1.2. Statistical mechanics and dedicated hardware

A major topic in theoretical statistical physics is the theory of phase transitions, including such subjects as the determination of phase diagrams, critical exponents, and the dynamics óf nucleation. The study of these problems by means of numerical simulations on a general-purpose computer has become a standard technique. The reviews given by Abraham [1] for the molecular-dynamics method and by Binder [2] for the Monte Carlo method contain many examples.

However, the difficulty to produce sufficiently accurate results within reasonable limits of time and money has remained. As stated already, the use of supercomputers may alleviate the first of these restraints, but is detrimental for the other. This difficulty is particularly important in the field of critical phenomena, where the physical systems studied exhibit long relaxation times and distances, requiring the simulation of large systems during long periods. The requirement that the simulated system is large may perhaps be relaxed somewhat, at least for lattice-spin systems, by making use of the so-called Monte Carlo Renormalization Group method [3,4,5], based on the renormalization group approach initiated by Wilson in the late 1960's; a certain price in terms of complexity of the calculations has to be paid, however. In any case, the study of phase transitions and critical phenomena by simulation techniques stresses the need for large computational power, thereby making the construction of special-purpose computers for these subjects into a challenging task [6].

Quite a number of these machines, for different specific problems, have been built in recent years or are now under construction. Examples are the machine built by Pearson et al. [7] for the Ising model, by Ogielski [8] for the spin glass model, by Toffoli [9] for cellular automata, and by Herrmann et al. [10] for the calculation of the electrical conductivity of percolation clusters. These machines have in common that they sacrifice flexibility in exchange for a greater computational speed for one particular algorithm.

In Delft, the idea of building special-purpose processors for simulation problems was originated by Veltman [11], and led to the construction of a small Monte Carlo machine for the Ising model (capable of handling 32x32 spins with nearest-neighbor interactions only), a counting machine for the determination of the combinatorial factor for Ising

(17)

systems up to 6 x 6 spins [12], and a Monte Carlo machine for the solid-on-solid model for crystal growth [13]. These machines paved the way for the construction of the Delft Molecular Dynamics Processor (DMDP) and the Delft Ising System Processor (DISP). The former, built by Bakker [14,15], is a machine for the simulation of solids and fluids by means of the molecular-dynamics method; for results obtained with this machine see e.g. refs. [15]. The latter is a machine for the Monte Carlo simulation of Ising spin systems and is the subject of this thesis.

Indeed, the stochastic algorithm used in the Monte Carlo simulation of Ising spin systems is a perfect test case for the construction of a special-purpose computer because of the binary character of the main variables involved, the spin values. Since its construction in the years 1979-1982, the DISP has produced a number of results [16,17,18,19], showing that it indeed offers an efficient and low-cost alternative to the use of a supercomputer for the study of phase transitions and critical phenomena in Ising systems. Recently, the hardware of the DISP has been extended to allow Monte Carlo Renormalization Group calculations in addition to the more conventional Monte Carlo calculations.

Once the decision is reached to build a special-purpose computer for a particular problem the question arises as to what particular level of specialization should be adopted for the machine. Although a precise answer to this question does not exist, the following remarks may serve as a guideline. The main advantage of a special-purpose computer, dedicated to a particular task, is its constant availability during 24 hours per day and 360 days per year, apart from time needed for servicing, test runs for auxiliary software or pilot runs for future studies. The overall efficiency of the machine during its expected lifetime of 6 to 10 years must be judged in terms of the useful results to be produced. It does not make sense to make the machine more flexible then is necessary, and it would be a bit wasteful to build a machine for a problem that is too narrow.

1.3. Outline

This thesis is arranged in the following manner. Chapter 2 is devoted to the general methodology of the design and construction of a special-purpose computer for the solution of a particular class of problems. The design desiderata, i.e. the desired scope of the DISP in terms of its calculational possibilities and the Hamiltonians that it can handle, are set out in chapter 3. The chosen realization in hardware, key to the low-cost calculational efficiency and flexibility of the DISP, is described in chapter 4, both for the MC and the MCRG algorithms. Considerable effort was invested in the hardware random-number generator to be employed; this is the subject of chapter 5. Chapter 6 is

(18)

-14-dedicated to the performance of the DISP in comparison with general-purpose computers. A short survey is given in chapter 7 of the software written for the host computer, necessary for initialization of the DISP, for data acquisition, and for the determination of results. Chapter 8 contains a review of some results obtained with the DISP up to the moment of writing. Chapter 9 is a discussion devoted to hardware extensions of the DISP and to a next-generation MC machine for Ising systems, which (due to recent and expected progress in the field of hardware components) can be considerably faster than the DISP. Concluding remarks are the subject of Chapter 10.

(19)

2. METHODOLOGICAL REMARKS

A unique method for the construction of a dedicated processor for problems in computational physics does not exist. Too much depends on the problem in question, on the research group that wants to build the processor, and on the available funding. The sketch given here of the necessary or desirable conditions is based on the experience of the Delft computational physics group. An attempt is made to give a general description, however, including for instance technological possibilities of today that were not available when the DISP (exclusive of the MCRG hardware) was designed and built.

Before the decision is made to build a special-purpose machine a feasibility study must be made, in which a number of questions should find a preliminary answer. Does the problem to be studied justify the effort of building the machine? Is the problem suitable? Does one have the necessary expertise, or can that be attracted from elsewhere? How does the global design of the machine look like? What are the desired properties of the machine? How large is the necessary budget?

Already for such a feasibility study the co-operation of different specialists is usually necessary. Computational physicists with a sound theoretical background in the problem in question are needed to estimate the ins and outs of the problem and to define the desired performance of the machine. Hardware specialists, experienced in the use and development of conventional and advanced micro-electronics, are essential to evaluate the technical possibilities. The parties involved should be sufficiently motivated to sustain a protracted effort: the actual construction of a dedicated processor that during its expected lifetime produces worthwhile results is usually not a minor affair.

In Delft, these preliminary conditions were met, which lead to the formation of the Delft computational physics group and to the construction and use of the Delft Molecular Dynamics Processor and the Delft Ising System Processor.

/

2.1. Physical aspects

A first step in determining the feasibility of a dedicated hardware device is to estimate the width (which quantities must be calculated?) and the depth (the desired accuracy of the results) of the physical problems to be studied. A trade-off must be made here between speed, flexibility and complexity. Furthermore, the calculations necessary for the physical problem selected must be based on a sufficiently simple algorithm that can be easily carried out in hardware. Modifications of the hardware of a special-purpose

(20)

-16-i

computer are almost impossible once the device is built. Extensive standard programming on general-purpose computers must be employed to determine the applicability and reliability of the algorithm for the selected range of problems. When a particular algorithm has already proven its value for solving the physical problem at hand (as is the case for the Monte Carlo method for Ising systems), this activity can be omitted. However, this procedure should always be carried out for other hardware sections that are part of the data processing (see chapter 2.3: software emulation of the special-purpose computer).

Special attention must be paid to a possible parallelization and vectorization of the algorithm, techniques applied in supercomputers and array processors. Parallelization, by incorporating a multitude of processing sections, all handling the same task, can speed up the execution rate by a factor equal to the number of sections. Vectorization, a matter of "pipelining" the data stream, decreases the computational time step to a level determined by the interval between successive storage of intermediate results. These architectural features lead to a large potential increase in computational speed, but may cause inconsistencies in the computational process. This is due to the fact that in the pipeline the different intermediate results must remain independent. For instance, when in the Ising model the Monte Carlo process is pipelined, a selected central spin may not enter (either as a central spin or as a neighbor) the pipeline again, because otherwise conflicting values of the same spin in the pipeline may arise. The addressing scheme of spin selection must take care of this problem.

2.2. Hardware considerations

2.2.1. Architecture

In this section the possible components of a dedicated processor and of additional computational facilities are characterized.

Elementary logic and microprocessors

Once the details of the physical desiderata are determined, knowledge of the intrinsic possibilities of basic hardware elements and microprocessors enter the picture. From the scope that has been set out, a decision can be made whether the use of standard microprocessors, possibly microprogrammable and possibly a certain number in parallel, will meet the requirements. The use of microprocessors will usually provide more

(21)

flexibility than the use of basic logic elements since microprocessors can easily be reprogrammed. On the other hand the use of elementary logic usually brings along a considerable speed improvement since the only limiting factors are the set-up times and propagation delays of the logic elements and the cycletimes of the memories used. A close look at existing special-purpose processors shows that basic logic elements are used in the most time-critical stages of a design while microprocessors perform tasks for which more time is available (control, data accumulation, preliminary data reduction, etc.). Nevertheless, the use of "random logic" around microprocessors is still required. For algorithms that impose complex calculations on a special-purpose processor, time-critical stages require a hybrid solution in which both methods are closely interwoven.

A decision at which level and for what functions basic logic elements must be used depends largely on the gain in speed needed, the extra complexity involved, and the availability of VLSI devices and gate arrays.

Logic families

When speed requires the use of logic elements the family to be used must still be chosen. Although TTL logic (both high-power and low-power Schottky) is still the most popular family, not in the least because it has the widest range of functional elements available, CMOS logic is gaining ground. Two features drive this transition: CMOS is now equally fast as or even faster then high-power Schottky TTL logic (and considerably faster then low power Schottky TTL); the currents involved are at least two orders of magnitude lower. Since most CMOS elements are available with TTL compatible logic levels, the use of CMOS should thoroughly be considered. Latch-up problems that may result in irreparable internal damage are still the major drawback in the use of CMOS elements. However, new CMOS generations are known to be less vulnerable in this respect.

ECL logic, the fastest logic available, is difficult to handle, precisely because of its high speed, but also because of its high power consumption. Except for those elements in this family that combine internal ECL logic with a TTL interface and thus with TTL compatible logic levels to the outer world, the use of ECL is not recommended here. Gate arrays

A recent development that may have a great influence on the design of special-purpose computers on basis of logic elements is that of gate arrays. Soon they will offer the possibility to create at relatively low cost "random logic" VLSI chips (up to 200,000 gates) in very fast CMOS technology, giving the user the possibility to tailor these chips to his particular needs. A typical example of the use of gate arrays would be a single

(22)

-18-chip random-number generator which is equivalent to the multi-component one now in use in the DISP.

However, the effort needed to design these devices is large. Although software programs may be used to "glue" certain basic preprogrammed gate-array functions together, the process to generate the masks that contain the pattern of interconnections of the elementary gates involves extensive software testing on a single gate basis to guarantee the production of bug-free chips.

PROM's and PLA's

Less powerful, but also requiring less effort in programming than gate arrays are PROM's (programmable read-only memories) and PLA's (programmable logic arrays). With these devices one may cut down the component count of parts of the hardware design considerably.

PROM's provide fast non-volatile memory, while PLA's are used for logic functions in combination with registers, thus forming a bridge between logic elements and gate arrays. An important advantage of the use of PLA's instead of logic elements is obvious in the test phase of the hardware device: to correct hardware errors it is simpler to program new PLA's than to rewire the interconnections of small-scale logic elements. The host computer

The special-purpose machine must have a fast data-communication link with a general-purpose computer. This host computer has a wide variety of tasks, fundamentally different from the functional elements that make up the special-purpose device. The host primarily serves as a data concentrator and therefore must be equipped with a non-volatile background memory for storage of results of measurements. It is also used for initialization of the special-purpose hardware. Some examples are: loading look up tables and other memories, reading back these data to check correctness, loading programs when microprocessors are incorporated in the hardware, setting up the data communication links with other microprocessor systems, personal computers, and large-scale computers. The host computer must also support program development in parallel with other tasks and thus must be able to run in a multi-program environment.

(23)

2.2.2. Construction

Apart from studying a particular physical problem with a requested accuracy and a given algorithm, a special-purpose machine should have a modular structure, which facilitates modifications as well as testing and repairing.

Modularity can be attained by omitting the integration of hardware sections with fundamentally different functions. Modularity serves flexibility in the sense that additional functions can easily be implemented or existing functions can be changed after the construction of the processor is completed and that faulty or less reliable sections can easily be replaced by duplicates. Flexibility is also increased by using RAM's (random-access memories, to be loaded by the host computer) rather than PROM's for use in look-up tables, decoding functions, etc., since data supplied by PROM's can only be changed by chip replacement.

The selection and the coherence of high-speed hardware components imposes a close inspection of the data specifications supplied by the producer of the chips, in particular with respect to speed and load conditions. In general, not typical but worst-case ratings must be used; even then an ample safety margin should be taken into account when applying these components.

The next step is the production of prototype boards. Since high-speed hardware components are used on these boards, stringent requirements are placed on their lay-out and interconnections. Most important are the necessity to keep the length of interconnections within certain limits and to suppress transient effects leading to reflections in these interconnections. The first requirement can be satisfied by minimizing the number of components and by the construction of high-density printed-circuit boards. This constructional aspect contributes simultaneously to the second requirement since reflections die out faster when the wire length is smaller. Nevertheless, careful tuning of the resistors for characteristic termination of backplane wiring is still necessary. High-density printed-circuit boards require the application of "fine-line" technology, for which the width of the interconnections is 80 to 200 microns only.

Synchronous operation of all clocked hardware elements is important. It allows single-step operation of the special-purpose processor when errors cannot be traced by other means. The basic clock signal should be distributed in such a way that the distance from the pulse generator to all elements that require the clock signal is about equal.

Wire-wrapping is a very common means for interconnecting hardware elements; however, oxidation of wire-wrap pins may gradually decrease the reliability of these interconnections.

(24)

-20-I

2.3. Software considerations

2.3.1. Special-purpose processor software

To operate a special-purpose processor a considerable amount of special-purpose software is needed, in particular to control elementary hardware functions. Typical examples are programs that establish the link between the hardware device and the host computer or peripheral equipment, programs needed to load microprocessors, to store the look-up table values (including correctness tests) or to initialize various other registers, and diagnostic programs.

The latter are used to test and maintain the special-purpose hardware. For the localization of hardware errors several possibilities are available:

1. Diagnostic programs used during the construction of the special-purpose processor for prototype testing (i.e. testing of assembled printed-circuit boards).

2. Diagnostic programs to trace hardware errors in the assembled special-purpose processor.

3. Software emulation of the special-purpose processor.

4. Diagnostic subroutines that are part of the application software for regular tests on the correctness of particular hardware functions.

The main task of the first two types of diagnostic programs is the detection of design errors and the localization of defective integrated circuits.

The third type is a procedure to make a detailed comparison between Monte Carlo calculations of the special-purpose processor with a software program that simulates the processor in all its aspects. Since the emulation program copies the intended functioning of the machine in every detail, it is a handy tool in the localization of systematic and accidental (hardware) errors, although it also can be used in the design phase of the special-purpose processor to determine the correctness of the design.

The last type is applied to detect hardware failures in the earliest possible stage when the machine is operated. The implication of these subroutines make the results of experiments more reliable.

The software packages for testing and application of the special-purpose processor may become quite voluminous and may ultimately contain bugs themselves. In the years the DISP has been operational, the hardware has often served to trace software bugs.

(25)

2.3.2. Software for additional computing facilities

A fundamental requirement to use special-purpose devices is that an efficient interaction exists with other computational systems. These systems are the resources necessary for putting the hardware device to work, for monitoring experiments, to support its continuous operation, and for post-processing results.

A sophisticated software environment for the additional computational facilities (with emphasis on user-friendliness) may highly affect the productivity when employing the special-purpose processor. The available system software for these facilities should have been designed to minimize the time needed to develop the application software for a given project and to maximize the ease of extending and modifying existing software. Important tools in this respect are an advanced operating system with a high-level programmable user interface, utilities (screen editors, debugging facilities, file management, etc.), advanced programming languages, data bases, and transparent mutual access between the computational resources.

A basic requirement of system software is portability, which means that all attached commercial computing devices look the same to the user as far as is possible. This minimizes the effort in (re)developing application software. Machines not running the "standard" operating system should at least have the same software utilities available (e.g., editors, programming languages). To achieve this level of portability requires that all system software be written in a portable language. Application software can then be compiled and run on any machine with a compiler for that language without having to bother about lost and newly gained options. An example of a nearly portable operating system is UNIX.

In the field of programming languages Fortran is still the most widely used, next to being reasonably portable. Since it has regularly been upgraded (now Fortran 77) it has kept its popularity among computational physicists. However, modern computing languages are being used in an increasingly significant way. The language C has intrinsic capabilities that reach far beyond Fortran and should therefore be available as an alternative standard programming language.

Much effort should be invested in software documentation since user supplied documentation is frequently out of date. It is recommendable to develop standards to prevent the proliferation of incompatible software versions.

(26)

-22-2.4. Experimental environment

A requirement in the endeavor to construct a special-purpose processor successfully is the availability of adequate financial means for purchasing the necessary parts for the special-purpose computer and for setting up an infrastructure for the electronics laboratory where the hardware device has to be constructed. The investments to be made must not be underestimated. The following means must be available or directly accessible:

PROM-PLA programming devices

As has been explained in chapter 2.2.1, the use of PROM's and PLA's may contribute considerably to a denser constructional design, to higher speed, and to less cumbersome hardware debugging. Thus, the use of these components is expedient, and accordingly one must be able to program them.

Devices for program preparation, programming, and verification of PROM's and PLA's are commercially available, usually in combination with a personal computer and (for PLA's) by using software that simulates the logic functions to be programmed. CAD-CAE workstations

The design of printed-circuit boards requires advanced software techniques for creation of the lay-out, especially when the use of fine-line technology is required or when, due to the complexity of the design, multilayer boards are needed. Unless through-plated printed-circuit boards can be produced within one's direct environment (as is the case in the Faculty of Applied Physics in Delft), either the produced lay-out or data carriers (punched tape, diskettes, etc.) must be used to delegate the production to commercial institutions. Although usually prototype boards are produced first, it may in some cases be advisable to build preliminary prototype boards by means of wire-wrapping.

A semi-automatic machine for wire-wrapping (full automatic machines are very expensive), connected to a computer capable of driving it may be used for the production of prototype and final versions of the boards in the special-purpose processor. The choice to be made depends on the costs involved, the desired compactness of the hardware device (printed-circuit boards are considerably thinner than wire-wrap boards), and the expected reliability.

(27)

Gate-array development stations

Acquiring the expertise necessary for the design of masks of interconnections in gate arrays takes a lot of time. The preparation of the masks require access to out-of-house facilities. However, once the technique is mastered, the possibilities to minimize the size of important parts of a special-purpose machine next to the improvement in reliability and speed are for the time being unprecedented. Since gate arrays can be characterized as semi-custom integrated circuits it is clear that their production is only financially attractive when a large amount of identical chips have to be produced. When the possibilities to produce different designs on a single silicon wafer can be realized, the production of small amounts of identical chips (>50) becomes financially attractive. Microprocessor development stations

The development of (micro)programs for microprocessors that are part of a design can be done on systems that are commercially available from the manufacturer of these microprocessors. Since these systems are usually not universal (capable of handling many different types of microprocessors) it may be advisable to develop cross compilers and cross loaders for the host computer in order to facilitate easy debugging and downloading of programs for microprocessors.

Testing equipment

When errors in the special-purpose processor can not be found by software means, which is normally the case when the system is constructed (design errors), the flow of binary data must be traced by means of logic analyzers that are fast enough to follow and store a series of logic events. The most advanced logic analyzers have means to store the setting and the signals on floppy disks. In case hardware errors arise, this facility allows comparison of the actual logic signals at test pins with the correct behavior, stored on the floppy disk. Much simpler devices that must be available for performing measurements on a local level are logic probes and logic pulsers.

Communication with number crunchers

Since the capability of a special-purpose computer is usually limited to handling a single algorithm, the host computer (possibly in addition to other small scale computing devices) has the task to swallow the resulting data that may be produced in vast amounts with very high speed. The computational capability and capacity of the host then may leave hardly any room for additional data processing, especially since the host must also support program development and facilitate the output of results, either produced locally

(28)

-24-or externally. This data must then be f-24-orwarded to and processed by large mainframes. The availability of a data link (ethernet, remote job entry, RS 232, etc.) and software that facilitates transparent communication with such systems is considerably more practical than conventional transportation by magnetic tapes. The protocol for data links depends largely on the amount of data that has to be transferred.

(29)

3. DESIGN CHARACTERISTICS

The development of the DISP aimed at the construction of a cheap processor

(necessary hardware investment less than 104 US-$) with a speed of at least one

elementary MC step per microsecond for Ising systems of up to a few million spins in two and three dimensions (2-D and 3-D respectively), with as wide a range of Hamiltonians as is expedient.

The Ising systems that are worth studying do not just contain pair interactions and interactions with an external field. Rather, the design to be selected should enable MC calculations including at least four different interactions of any type, though restricted to a certain range. Quite arbitrarily, this range was chosen to be a square area of 8 x 8 spin sites in 2-D or a cubic volume of 4 x 4 x 4 spin sites in 3-D. The number of spins within this range, 64, is small enough to be manageable while being sufficiently large to allow many different Hamiltonians.

To further explain this point, which is basic for the structure of the DISP, the following terminology is used. The spin that at some moment in the MC simulation is selected for the MC process is called the central spin. The collection of spins with which the central spin through terms in the Hamiltonian is interacting is called the local configuration (including the central spin). In the elementary MC process the new value of the central spin is determined by the Boltzmann factors p + and p _ valid for the given

local configuration with the central spin "up" or "down" respectively. These Boltzmann factors (or rather, the related transition probabilities) can be stored in advance in a look up table, the addresses of which must be determined from the local configuration. For fast simulation, the spin values of the local configuration must therefore be readable simultaneously, which implies that they should be contained in separate parts of the memory in which all spin values are stored, these parts (or memory banks) being simultaneously addressable and readable. If the local configuration does not extend beyond the (square or cubic) range of 64 spins, no more than 64 separate memory banks will be needed. Each spin within this range can then be accommodated on a separate memory bank. Which spins on which memory banks should be read is completely determined by the addresses of the spins within the local configuration; in the chosen structure this information is available simultaneously.

Less drastic for the structure of the DISP are two other features that were adopted. Periodic boundary conditions of the toroidal type without helical shifts were chosen. In addition it was desired that the selection of the central spin, i.e. the spin to be subjected to the MC process, could be chosen either randomly or by going through the lattice in a

(30)

-26-sequential manner.

A typical Hamiltonian to be studied is

# = * i 2 */ + *2 2 V j + *3 2 S,SjSk + KA 2 SiSjWl' (3-D

i = l <ij> <ij,k> <ij,k,l>

where N is the number of spins, while K{ denotes the coupling constants, and s{ = ±1

the spin values. The notation <i,j,k> indicates that the corresponding lattice sum must be taken over all translations over the lattice of one or more (e.g. up and down triangles) particular triples, selected from the (square or cubic) range of 64 spins discussed above; a similar remark applies to <ij> and <i,j,k,l>. It is possible to choose these sets of interacting neighbors such that the symmetry of the Hamiltonian is different from simple quadratic or simple cubic. For instance, in two dimensions one can define <i,j> to include, in rectangular coordinates, the neighbors at ±(0,1), ±(1,0) and ±(1,1). This choice leads to a triangular Ising model with nearest-neighbor interactions. Analogously face- and body-centered cubic models can be defined in 3-D. The expression (3.1) was taken to define the desired scope of the machine in the design phase, but it does not exhaust the possibilities of the final realization of the DISP. For instance, modifications in which the terms with KT, and K^ are replaced by further pair interactions, e.g. with second and third neighbors, are also allowed. Often, these modifications can be realized by software adaptations; in some cases, minor hardware modifications would be necessary. However, practical considerations cause certain additional (though not restrictive) constraints on the Hamiltonians to be studied on the DISP; these constraints will be discussed later.

The summation in each term of the chosen Hamiltonian is performed by accumulation in lattice-sum registers. When necessary, other terms can easily be added to the four included in the above Hamiltonian, by using more lattice-sum registers and by extending the size of the look-up table in which the transition probabilities are stored for the spin-flip mechanism.

The random-number generator was chosen to be of the linear 2-bit-feedback shift-register type. In order to investigate its quality, a high degree of flexibility regarding shift-register length and feedback position was required.

At a later stage, the design of the DISP has been extended to perform Monte Carlo Renormalization Group (MCRG) calculations. This means that the processor must not only be able to simulate Ising systems by means of the MC method, it must also be able to calculate many different multi-spin correlations. Furthermore it has to be able to renormalize the spin configuration, i.e. it has to map the spin configuration on a renormalized lattice which is smaller by a factor two in linear size. As it turned out,

(31)

these extensions could easily be incorporated into the design of the DISP. Details of these extensions are given in chapter 4.6.

(32)

(33)

HARDWARE ARCHITECTURE

4.1. Functional organization

As shown in Fig. 1, the DISP is composed of a number of hardware sections, each carrying out one or more operations that are part of the elementary spin-flip mechanism.

The address-generation section consists of several address generators: one for the selection of random sites for the central spin, one for sequential selection, others for loading and reading look-up tables and the main spin memory, and still others for the MCRG block-spin generation and correlation function determination process. The generator to be used is decided in advance of a particular operation by means of software instructions for the host computer, for which a HP 1000 system is available. The address of the central spin selected is passed on to the section which determines the local configuration; this section is programmed by the host computer to generate the spin addresses of the local configuration (taking into account the details of the Ising system to be studied: dimensionality, lattice structure, Hamiltonian, block-spin generation). The spin values of the local configuration are read simultaneously from the main spin memory (consisting of 64 different memory banks) and determine the address in the look-up table

V-vT_

C Z ^ ADDRESS GENERATION

C^)\ LOCAL CONFIGURATION TABLE

Z Z D | AUXIUARY SPIN MEMORY

/ i - N J

HOST COMPUTER HP 1000

MAIN SPIN MEMORY 4M BITS

n

Z Z Q j BLOCK SPIN MEMORY |

<*-*>! CORRELATION FUNCTION \A

^ " ^ DETERMINATION V

« LATTICE SUMS UPDATING i z

<^\ TRANSITION PROBABILITY TABLE

> 1

/ — > \ RANDOM NUMBER N i - / m . . . . .m

M - ^ GENERATOR ^COMPARATOR

(34)

-30-where the relevant transition probability is found. This transition-probability (TP) table is filled as part of the initialization procedure by the HP 1000; the value stored in a particular entry pertaining to a particular local configuration is

P +

P = ~ , (4.1)

P++P-where p+ and p _ are the Boltzmann factors for the particular local configuration with the

central spin "up" and "down" respectively. This corresponds with the transition probabilities for the MC simulation of Ising systems used first by Yang [20]; alternatively, the DISP can also be run with the transition probabilities used by Fosdick [21]. On the other hand, when in the renormalization process a block-spin configuration must be determined, the TP table is filled with values that correspond with the particular rule (e.g. the majority rule) with which the block spins are defined.

The value P found in the TP table for the actual local configuration is compared with a random number R (uniformly distributed between 0 and 1) given by the random number generator (RNG) in the comparator section; if P > R holds, the central spin is given the value + 1 , otherwise - 1 . If the new value of the central spin differs from the old one, the main spin memory must be corrected and the four different lattice-sum registers (one for each term in the Hamiltonian (3.1) or a similar one) must be updated. The contents of these registers can be used to calculate the thermodynamic quantities of the system under investigation, either at the end of a complete MC run or at certain predetermined moments during a run.

The other sections of the DISP shown in Fig. 1, i.e. the auxiliary spin memory, the block-spin memory, and the block-spin correlation function section are needed for MCRG calculations and will be discussed below. In general, the remainder of this chapter is concerned with further details of the hardware structure.

4.2. Bus-structure

The sections of Fig. 1 are interconnected by means of four data busses: the C(ontrol)-bus, the A(ddress)-bus, the D(ata)-bus, and the S(pin)-bus. The C-bus is a 16-bit data path that controls the function of the hardware sections of Fig. 1 and the opening and closing of data paths within the DISP. The A-bus is 22 bits wide and carries the addresses for selection of central spin sites during the MC process, next to the transfer of addresses for loading and reading look-up table values, register values, etc.. The 16-bit wide D-bus is used for data flow from the host computer to the different

(35)

sections of the DISP and vice versa; in addition, the D-bus is used for some data transfers within the DISP. Finally, the main task of the 31-bits wide S-bus is to carry central and neighboring spin values from the main spin memory to the look-up table, but it can also be used for data I/O with the host computer. The bus structure is shown in Fig. 2, in which most available functions of the DISP are included.

4.3. Spin-memory organization and neighbor determination

The main spin memory has a size of 222 (4M) bits in which (or in part of which for

smaller lattices) the momentary spin configuration is stored. This configuration is generated by the MC process or, at the start of an experiment, initiated by the host computer. The neighbor determination section is interwoven with the spin memory. Given the central-spin address, it produces the addresses of all spins of the local configuration simultaneously and directs the spin values to the S-bus. The method uses a subdivision of the lattice in 64 separate memory banks, each having a single bit output. Thus, theoretically 64 spins can be produced simultaneously. However, the hardware demultiplexing scheme only allows for processing a local configuration of up to 31 spins, which was considered to be sufficiently large; this of course restricts the Hamiltonians that can be studied with the DISP.

In line with the addressing scheme of the main spin memory, the lattice can be thought to be subdivided in an array of cells, each containing 64 spins. Each spin within

DATA BUS iz. HP 1 0 0 0 HOST COMPUTER 7% PARALLEL TO SERIAL CONVERSION CONTROL BUS VIDEO LATTICE DISPLAY "~57— MCRG CORRELATION REGISTERS SERIAL T O PARALLEL CONVERSION <J. LOCAL CONFIG. TABLE MCRG ADDRESS GENERATOR SEQUENTIAL ADDRESS GENERATOR ADDRESS BUS 6 4 K SPIN MEMORY BANK »

T

AUXILIARY SPIN MEMORY RANDOM NUMBER GENERATOR LATTICE SUM REGISTERS 7T iZL J BLOCK SPIN n MEMORY A is, TRANSITION PROBABILITY TABLE DATA REDUCTION

2_3I

RANDOM NUMBER GENERATOR COMPARATOR _<Z. SPIN BUS

(36)

-32-one such cell is residing in a different memory bank, while each memory bank contains the spins on one particular location in all cells. The cell size, being equal to the number of memory banks, is conveniently chosen to be 64 which is both square and cubic. Thus a 2-D lattice is represented by an array of square blocks of 8 x 8 spins, a 3-D lattice by an array of cubic blocks of 4 x 4 x 4 spins. This way the hardware needed for switching between 2-D and 3-D lattices is simplified considerably. Comparison with the discussion in chapter 3 will show that the range in which the local configuration of spins can be defined, is identical with the cell size (i.e. spins on the same position of different cells reside on the same memory bank and cannot be read simultaneously). Furthermore the smallest lattice that can be defined, is represented by one cell (one bit of each memory bank).

As the total system size is 4M spins, each memory bank consists of randomly accessible memory elements, together 64K x 1 bit large. The hardware within the 64 memory banks has to transform a 22-bit address on the A-bus into a 31-bit data package on the S-bus. This data package is to be routed to the TP table. The position of neighbors on the S-bus depends exclusively on the position of these neighbors with respect to the central spin. The 22-bit address on the A-bus is split up in two parts. The least significant six bits define the position of the central spin within a cell (choice of the memory bank in which the central spin resides). The other 16 bits are used to select the cell in which the central spin resides (position within that memory bank).

The task of the additional hardware of each memory bank, all receiving the same 16-bit cell address of the central spin, is to find out whether the addressed spin is part of the local configuration. If this is the case, it must decide whether the spin is located within the same cell as the central spin, and what line of the S-bus the orientation of the

<-«/ CELL ADDRESS CORRECTION LATTICE SIZE MASKING 3£. MEMORY BANK 64K X 1 BIT RAM ADDRESS BUS LOCAL CONFIGURATION TABLE 64 X 12 BITS RAM

S i BUS LOCATION

SELECTOR

« 2 N.C. SPIN BUS

Fig. 3: One of the 64 identical parts of the spin memory, each with its own neighbor-identification and decoding section.

(37)

selected spin is to be sent to. Since a certain memory bank only contains spins residing on a particular position within each cell, the key to selecting the right spin (or not selecting any spin at all) of a memory bank is the use of a look-up table, the local-configuration (LC) table. This look-up table is primarily used to point out the position of the addressed spin on the S-bus. When this spin is not part of the local configuration, it is sent to the "32nd S-bus" position which is a dead track on the memory bank hardware.

This explains why the S-bus is just 31 bits wide. The hardware scheme as outlined above is shown in Fig. 3.

When the lattice size is larger than one cell only, additional information must be supplied by the LC tables because certain neighbors may reside in adjacent cells in the X, Y, and Z directions. In this case the 16-bit cell address is corrected. Zero is added to this 16-bit word when the cell address is identical, + 1 or - 1 for positive respectively negative corrections in X, Y, and Z directions. Obviously the LC table on each memory bank needs 64 entries with five bits out for S-bus selection and six bits out for X, Y, and Z cell-address correction.

For direct determination of the location of the central spin, an additional bit is used in the LC tables. This eliminates the need for decoding the S-bus line number data in order to find out on what memory bank a possible spin-flip command is to be executed. The data in the LC tables depend strongly on the lattice structure (2-D: triangular or square; 3-D: simple cubic, fee or bec) and on the interactions that have to be taken into account. For this reason random access memories (three 6 4 x 4 bit RAM's on each memory bank) are used, to be loaded by the host computer. Reading back this data from the DISP to the host computer is a standard test practice of the initialization procedure for each MC run.

The 16-bit cell address is divided in two 8-bit parts for X and Y coordinates in 2-D, and in three 5-bit parts for X, Y, and Z coordinates in 3-D systems (in 3-D the maximum lattice size is 2M spin sites). Masking out the most significant address bit in a certain direction reduces the size of the lattice in that direction by a factor two. In this way correct periodic boundary conditions are automatically maintained because overflow of the cell address in a certain direction reverses the cell address in that direction to zero. Consequently, the size of lattices in any direction is restricted to powers of two. As a result, possible system sizes are:

L = 2kx2' with 3 ^ it,/ < 11' (2-D),

(38)

-34-4.4. The transition-probability table

The data in the look-up table in which the transition probabilities of Eq. (4.1) are stored (the so-called TP table), have a resolution of 32 bits. In order to vary the system structure, coupling constants, etc., the use of random-access memory elements combined with I/O with the host computer is imperative.

The first problem that had to be coped with, is the impossibility to direct a local

configuration of 31 spins on the S-bus directly to the TP table (32 X 232 bits of storage

would be needed). However, the number of entries into the TP table can be reduced considerably because many spin configurations have equal energies. As the number of plus-minus (PM) bonds of the central spin with its interacting neighbors of a certain type partly determine the energy level of a spin configuration, PROM's may be used that are programmed to generate this number; this procedure will be indicated by PM count. Whenever this is done, not the orientation of the neighbors, but the result of the PM count is part of the address for the TP table. It should be noted that the spins used in the PM count cannot be used for multi-spin interactions. At the time the DISP was built, it was decided to include elementary three- and four-spin interactions for 2-D systems only. When exclusively nearest- and next-nearest neighbors are taking part in the active interactions, the number of TP-table entries for a few typical lattices is as follows:

2-D square: 29 (central spin and all 8 neighbors),

2-D triangular: 210 (central spin, 6 nearest neighbors and PM count

of 6 next-nearest neighbors),

3-D S.C.: 29 (central spin, PM count of 6 nearest neighbors

and PM count of 12 next-nearest neighbors),

3-D F.C.C:: 25 (central spin, PM count of 12 nearest neighbors),

3-D B.C.C: 25 (central spin, PM count of 8 nearest neighbors).

Based on the above data, the minimum size of the TP table is determined by 2-D triangular systems; it was therefore fixed at 1024 x 32 bits. One can easily check that a 2048 x 32 bit TP table would have enabled us to use the six nearest neighbors of a simple cubic lattice directly (i.e. without PM count) as part of the TP-table address. Thus, an increase of the size of the TP table by a factor two will open the possibility to take into account certain multi-spin interactions for 3-D simple cubic systems; the corresponding additional hardware is under construction. The momentary addressing scheme of the TP table is given in Fig. 4.

The 32-bit data word, which is the output of the TP table for a certain neighbor configuration, is compared directly with a random number by a series of cascaded magnitude comparators. The result may be a spin-flip signal that is synchronized with

(39)

the system clock and sent to all memory banks of the main spin memory. The spin-flip command is executed only there where the LC table of a memory bank, by means of its

VIth bit, indicates the presence of the central spin.

4.5. Lattice-sums updating

The lattice sums are updated at the end of each MC step in which a spin-flip command is generated. The host computer uses these sums as input for the determination of miscellaneous thermodynamic quantities. Transfer of these values to the host computer, where they are accumulated and at constant intervals stored in the background memory (i.e. on disc), is programmable by means of a parameter list, pertinent to the actual experiment; this will be discussed in chapter 7.

The hardware for the updating mechanism only needs to take care of the change in the number of plus spins, the change in the number of PM bonds of the central spin with its neighbors and, for multi-spin interactions, the change in the sum of the products of the spins involved. This means that the data on the S-bus, or rather the generated TP-table address that represents the local configuration either in terms of a copy of part of the S-bus data or in terms of the already performed PM count, can be used directly for adjustment of the lattice-sum register data.

The actual hardware structure makes use of the fact that the number of plus spins, which is equivalent to the magnetization, can change by the values - 1 and + 1 only. Therefore a series of up-down counters keeps track of this number. The change in the

SPIN BUS

NEIGHBOR SWITCHING NETWORK

PROM + / - COUNT

I

PROM ♦ / - COUNT

•w

TRI-STATE SWITCHING

TRANSITION PROBABILITY TABLE ADDRESS

Fig. 4: Data reduction scheme used for the transformation of data on the spin bus into addresses for the transition-probability table.

(40)

36

-number of PM bonds will always be a multiple of two with a maximum of 24 for 3-D systems (12 neighbors). In this case one of the PROM's (#1 to # 8 in Fig. 5) translates the TP-table address into half the change of these lattice sums. The same principle as used for keeping track of the magnetization, can now be employed when the up-down counters are preceded by a 4-bit register. The 4-bit sum obtained by adding the data supplied by the selected PROM to the data contained in the 4-bit register, will occasionally result in over- or underflow, causing the counter array to add the value -1-1 or —1 to its original content. For this addition no adders are used but once more a PROM (in Fig. 5: "PROM full add") that is programmed to generate the sum of the original value and the data that represent the change. Furthermore this PROM supplies one bit used for commanding the counters to change (increment or decrement) their value and one bit that tells the counters to count up or down.

When an experiment is started, the lattice-sum registers must be preset to the sums that are in force for the lattice stored in the main spin memory. The method employed makes use of the facility to change the values in the TP table according to one's needs.

PROM # 1 SUM ADJ.

T7 7 7 7 7 7 7 7T

If 77

Fig. 5: Lattice-sum updating section. The updating is performed by forwarding the TP-table address to a PROM. This PROM provides the quantity by which the lattice sum at hand has to be modified when a spin-flip command is issued. The D-bus is used for initialization of the lattice-sum registers, the S-bus for transferring the values of the lattice sums to the host computer.

(41)

When the TP table is loaded with zero and nonzero values at locations for which the central spin is up or down respectively, the random-number generator is set to produce zero's only and the main spin memory is loaded with its initial spin values, one sequential sweep through the lattice will result in turning all spins up. When in advance of this sweep the lattice-sum registers are all cleared, they will after this sweep take on values that, aside from some simple software manipulations, represent the actual lattice sums. The generated values are routed to the host computer via of the S-bus and the modified values are sent back to the lattice-sum registers via the D-bus. As the contents of the main spin memory is destroyed by the above procedure, the main spin memory must be reloaded with its initial configuration. This procedure is standard and is employed in advance of each experiment (even when the lattice sums are trivial). The same is done at regular intervals during and at the end of each experiment. The latter is not strictly necessary but enables one to check whether the experiment, as far as the lattice sums are concerned, did run correctly.

4.6. Monte Carlo Renormalization Group calculations

According to Swendsen [4], the critical behavior of Ising models can be studied by means of Monte Carlo Renormalization Group (MCRG) calculations. The idea behind MCRG is that real-space renormalization transformations [22] can be formulated in terms of correlations between certain lattice sums. This means that it is not necessary to derive the renormalization transformation in coupling parameter space. It is sufficient to derive the "renormalized" block-spin configurations and to calculate the averages of the desired lattice sums and their correlations.

The time-consuming repetitive part of the MCRG algorithm can be summarized as follows:

1. Apply the MC process to an Ising system until a (sufficiently) independent configuration is obtained.

2. Calculate the lattice sums.

3. Renormalize the spin configuration and derive the renormalized lattice sums.

4. Repeat step 3 until the smallest lattice size the DISP can handle (8x8 or 4 x 4 x 4 ) is reached.

5. Repeat steps 1 to 4 a large number of times, accumulating the lattice sums and their renormalized values.

(42)

-38-In addition to MC simulations, the MCRG procedure is concentrated around two basic operations: block-spin transformations and the calculation of lattice sums, e.g. the magnetization, pair-correlation functions and multi-spin correlation functions. The original structure of the DISP, outlined in the preceding chapter, is turned out to be suited to incorporate these MCRG calculations, owing to the flexibility of the design, owing to the type of periodic-boundary conditions chosen and owing to its modularity. This incorporation will now be discussed.

4.6.1. Block-spin transformation

The renormalization transformation implies the substitution of blocks consisting of a number of spins by a single spin, the block spin. Here only blocks of 2 x 2 in 2-D or 2 X 2 X 2 in 3-D systems are considered. When the lattice is divided into such elementary blocks, a "renormalized" block spin can be calculated according to a majority rule. A random factor can be included that depends on the sum of the spins that are relevant for the local transformation. Table 4.1 and table 4.2 list the possible choices for spins that are considered to take on the values - 1 or + 1 . Note that the MCRG transformation is symmetrical with respect to the situation where the number of down spins is equal to the number of up spins: up-down symmetry is conserved by the MCRG transformation.

Three factors made it possible that the standard hardware of the DISP can be used for block-spin transformations:

1. Within the DISP the neighbors of a spin can be chosen quite arbitrarily (the neighbors of the other spins follow by translational in variance).

2. The TP table may just as well contain values in which the majority rule (including the random factors a, |3, y, and 8 of table 4.1 and table 4.2) is incorporated.

3. The available RNG can provide the random numbers needed for block-spin determination.

In practice, when all functions of the DISP are set correctly, just one sweep through the relevant part of the main spin memory will do the job; in this sweep only one spin out of each elementary block has to be visited (leap-frog sweep). As the DISP can simulate lattices that may differ in size by a factor two in every direction from 1283 down to 43 in

three dimensions or from 20482 down to 82 in two dimensions, scaling down the lattice

over a number of intermediate renormalization levels can be performed by mere repetition. Evidently, the procedure of successive block-spin transformations entirely destroys the original spin configuration, produced by the standard MC simulation. It is therefore necessary to store a copy of the original configuration before the block-spin

(43)

transformations are started. Another aspect of performing successive block-spin transformations is the necessity to scale down the lattice within the main spin memory (by throwing all except the generated block spins away). This is achieved by a momentary storage of the block spins outside the main spin memory, followed by writing them back in a "packed" manner (see chapter 4.7), i.e. in accordance with the main spin-memory organization described in chapter 4.3.

The functions of the DISP described above require different address counters. For block-spin transformation, only every second position in the two or three directions must be addressed. For block-spin storage (the LC tables are programmed to select all block spins within a single cell, that is, sixteen block spins in 2-D or eight block spins in 3-D),

2-D sum of 4 spins 4 2 0 - 2 - 4

Block spin probability

P+ P-a

P

' %

1-P

1 - a 1 - a

1-P

%

P

a

Table 4.1: Transition probabilities for 2-D block-spin determination

3-D sum of 8 spins 8 6 4 2 0 - 2 ' - 4 - 6 - 8 Block spin P + a

P

"1 8 % 1-5 1-7

1-P

1 - a probability 1 - a

1-P

1-7 1-8 % 8 •Y

P

a

The Delft Ising system processor: Design, construction, and operation of a dedicated processor for Monte Carlo experiments on Ising spin systems

Design, construction, and operation of

a dedicated processor for Monte Carlo

experiments on Ising spin systems

A. HOOGLAND

TRdiss ^

1588

The Delft Ising System Processor

Design, construction, and operation

of a dedicated processor

for Monte Carlo experiments

ISING SYSTEM PROCESSOR

PROEFSCHRIFT

TR diss]

1588

"Any method goes"

Paul Feyerabend

CONTENTS

1. INTRODUCTION

2. METHODOLOGICAL REMARKS

3. DESIGN CHARACTERISTICS

HARDWARE ARCHITECTURE

n

T

2_3I

I

•w

T7 7 7 7 7 7 7 7T

If 77

P

1-P

1-P

P

P

1-P

1-P

P