CA1159151A

CA1159151A - Cellular network processors

Info

Publication number: CA1159151A
Application number: CA000338066A
Authority: CA
Inventors: Gyula A. Mago
Original assignee: Individual
Current assignee: Individual
Priority date: 1978-10-27
Filing date: 1979-10-19
Publication date: 1983-12-20
Also published as: EP0010934B1; DE2967022D1; EP0010934A1; JPS5592964A; US4251861A

Abstract

A CELLULAR NETWORK OF PROCESSORS

ABSTRACT OF THE DISCLOSURE
A network of processors having a cellular structure is capable of directly and efficiently executing a predetermined class of programing languages such as applicative languages.
The network includes two interconnected networks of processors, one of which is a linear array of cells of a first kind and the other a tree network of cells of a second kind. The net-work directly interprets a high level language and is capable of operating on a wide range of classes of programs. Within practical limits, the network accommodates the unbounded par-allelism permitted by applicative languages in a single user program. The network also has the capability of executing many user programs simultaneously.

Description

~ :i .,9 ~

'[`hi in~ention re:lates -to information handling systemS
and more particular]y -to information handling systems having a plurality of ce~lular processors connected in a predetermined structure to efficiently execllte predetermined classes of progra~ming ~Languages.
Applica-t;ve languages have been defined in papers by J.W.Backus entitle~PROGRAMMING LANGUAGE SEMANTICS AND CLO~ED
APPLTCATIVE LA'~UAGE, Conference Record o~ ~,CPI ~mpos;um On ~ o Pro~rammin~ Lan~uages, Boston, Massachusetts, October,1973, and "Reduction Languages and Variable-Free Pro-gramming", IBM Research Report RJ1010, Yorktown Heights, New York, April 1972. In these papers, Backus defines classes of programming languages having a high degree of versatility in problem solving.
For further description and discussion of applicative languages reference is made to the Backus papers which are incorporated by references although not fully set forth herein.
Another reference to applicative languages and more particular to a system for realising a specific reduction language as a machine is shown by Berkling in his paper enti-tledREDUCTION
LANGUAGES FOR REDUCTION MACHINES, Proceedin~s of the Second An-nual Symposium on ~ er Architecture, Houston, Texas, January,1975. As above, although Berkling is not set forth completely herein, reference is made to the concepts relating to reduction languages in the Berkling paper.
Reduction languages as described by Backus, are a class of high level, applicative, programming languages with several unique and attractive properties. (In an applicative programming language the elementary step of computation is the applica-tion of a function of operator to its operand(s).) Among currently ~9~5 1 . e~l pro~rranl ] Itlglli:l.gCS -they can be likened -to APL or LISP, but -the proE)erty of greates-t -interes-t is -that they are capable of expres;;ng t)aIallelism in a natJural fashion, not requiring ;nstruc-tion from -the pro~ranlmer to initia-te and terminate execl~lon pat;hs.
~ ny progralTI wri-t-terl in a red-lc-tion language contains fewsyrltactic Tnarkers, and so-called atomic symbols. The latter may serve as da-ta items, primitive operators, or names of defined functions. Only two kinds of composi-te expressions are allowed, each composed of other expressions, which are usually referred to as subexpressions. A sequence of length n, n ~ O, denoted by (al~ a2,...., an) if n _ 1 and ~ otherwise, where ai (called the ith element of the sequence) is an arbitary well-formed expression. An application is denoted by <a, b~, where a (called the operator) and b (called the operand) are again well-formed expressions.
Of these two forms of composite expressions only appli-cations specify computations. Since the program text at any time may contain many applications, possibly nested, sequencing among them is specified ( at leas-t partially) by requiring -that only innermost applications can be executed. There is no sequencing requirement among innermost applications--they can be executed in any order. The process of executing an inner-most application is called a reduction, and so we shall often refer to an innermost application as a reducible application (RA).
A reduction results in replacing an innermost application with the result expression, which may, in turn contain further applications. The reduction rules relevant to innermost applications can be summarized as follows. If the operator is an atomic symbol, it might be a primitive operator (in which 3 case i-ts effect is specified by the language definition), or it might be the name ot a defined f nction,i.e., of some well-formed expression containing no application (in which case the atomic symbol is replaced by that expression). If the operator ~ u~ er~ , it i~ -int~ ,r~ (l a. a composite operator, conoose(l oi tile olel~lellts ~ the selluence (Backus describes two po.si.~)Le ~ tetrl~ltive~ reglllar -an(l meta compos:ition). The compllta-ti.on con1~r, -to a halt when -th~-o:re are no more reducible app]-ica.-tions le~-t;. an(l -t.he prog~ram text so produ(ed is -the re-sult of the compu-tation. If`-the result of a reduction is unde-flned, the synlbo~.lis used to denote it, which ;s neither a syntac-tic marker, nor an a-tomic symbol, bu-t a special expression.

The following example should illustrate mos-t of these concepts. Assume tha-t IP (inner product) is a defined operator, representing IP~ (+, (AA,`'),TR), whereas AA (apply to all), TR
(transpose), -~ (addition), and ; (multiplication ) are primitive operators of a reduc-tion language. Suppose -the initial program text is -C IP,((1,2,3 ~, (11.12.13.14)) ~. First IP gets replaced by its definition, resulting in c(+, (AA,'~),TR), ((1.2.3.4), (11,12~13,14))~. Since the operator now is a regular composition of three expressions, after a few reductions we get the following program text;
C +, ~(AA,~ R, ((1~2~3~4), (11,12,13,14))~
Now TR is the operator of the only innermost application, and the result of applying TR to the two sequences of the operand is ~+,C~A,7'), ((1/11), (2,12), (3,13),(4,14) )77 The only innermost application has a composite operator again, but this time it is a meta composition, resulting in C+,<~,(l,ll)~,C~-, (2,12)~ , (3,13)~ ,~ *, (4.,14)7)~.

Now we have four innermost application, and they can be reduced in any order, but the addition operator cannot be applied until all multiplications are complete, so a-t some point we must have the program text ~+, (11,24,39,56)~, which finally reduces the number to 130.
In the prior art, there are many teachings relative to multiple processor information handling systems.

~ ,~

5~

~n article hy l~lillialn L.Spetz appearing in Computer magazine, July 1977, at page 64 and following entitl~ Micro-processor Networks, gives a genera] survey of a number of approaches to connec-ting a number of independent processors to enhance system perEormance. Although Spetz does at page 66 discuss hierarchical pyramid systems, he does not discuss a ce]lular processor in the sense of the instant invention nor does Spetz consider the application of these multiprocessing systems to the execution of programs in applicative languages.
In fact, Spetz indicates in his discussion that the pyramid system is generally not to be favoured because of what he considers to be practical, reliability, turn around time and developmental difficulties.
U.S.Patent 3,978,452 discloses a data driven network of uniform processing or function modules coupled with local storage unit which may be partitioned to accommodate various concurrent operation~ The device of the patent is not cellular in construction, nor does it contemplate the execution of language primitives by the cooperation of a group of processors.
U.S. Patent 3,962,706 shows a data processing system for executing stored programs in a parallel manner. The device of this patent is not cellular in construction, and it does not contemplate the execution of language primitives by the co-operation of a group of processors.

U.S.Patent 3,593,300 discloses a parallel processor system in which all of the processors are dependent upon a single large scale storage system in the classical von Neumann manner.

The patent does not contemplate execution of programs in an applicative language.

U.S.Patent 3,959,775 teaches a multiprocessing system employing a number of microprocessors as processing elements 1 ~91~ ~
in ~hich a sys-tem cor,trol assigns priority to each of the micro-processors for access -to the system bus and access to the system memory. Again the teachings of the patent relate to a conventional von Neumann computer and do not contemplate solution of problems stated in an applicative language.
U.S.Patent 3,940,7~3 teaches a system interconnect unit for connecting -together a number of independently operating data processing systems wherein when a unit accesses an inter-connect system, the unit acts as a periphera] device in the conventional von Neumann manner. The patent does not teach execution of problems in applicative languages.
U.S.Patent 3,566,363 teaches an early multiprocessing computing system including a specific discussion of inter-processor communication and control. As above, the patent is concerned only with conventional von Neumann computer systems and does not contemplate problem solving using applicative languages.
In a paper entiled "A General Purpose Array with a Broad Spectrum of Applications", published in the proceedings of the Workshop on Computer Architecture, May 22nd and 23, 1975, Erlangen, Handler et al discuss a system of processors in which processors defined as array processors function to process jobs submitted by a second level of processor. Each of the processors discussed in the paper are capable of executing programs of a predetermined size and thus a small network con-taining 2S few as four array processors can function to execute various kinds of programs. Although the paper discusses the system as a freely extensible cellular structure, it is im-possible for all cells to be identical; for example, the paper states "the C processor is responsible for overall control, for allocating suitable parts of the array to jobs, and for pro-viding the B processors with control information". ~rom this it is clear that the C processor must have full knowledge about all A and B processors under it. That is it must know what ~,!, ~ 5 ~ ~ ., 3 ~

they contairl, e-tc. ( In adclition~ as the size of the pyramid increases" the size of B and C processors must be proportional to how hlgh up in the pyramid they are, hence the network cannot be cellular). The paper does not address the problem of ~ effectively program SllCh a system to take maximum ad-vantage of the parallel processing capabillties provided. The paper does not deal with the solution of problems expressed in applicative languages.
U.S.Paten-t 3,646,523 discloses a computer whose machine language is the lambda calculus. However, the apparatus dis-closed is not cellular in nature, but rather it comprises a number of von Neumann processors, a large linear memory, and a central communication facility, in which an associative memory system is used as the common element between a number of processing units.
The following is a list of publications which may re-late to processor networks or applicative languages;
J.B.Dennis, "Computation Structures",COSINE
Committee L~ Princeton University, Department of Electrical Engineering, Princeton, New Jersey, July,1968.
J.B.Dennis, "Programming Generality, Parallelism and Computer Architecture," Informat,ion Processin~ 68 North-Holland Publishing Co., 1969,pp.484-492.
J.B.Dennis, "First Version of Data-Flow Procedure Language, "Lecture Notes in Computer Science, vol.19, pp.362-376, Springer Verlag, N.Y.,1974.
J.B.Dennis and D.P.Misunas, " A Preliminary Architecture for a Basic Data-Flow Processor," Proceedin~s of the 2nd annual Symp _ um on Computer Architectures, IEEE, N.Y.
1975, pp.l26-132.
A.W.Holt and F.Commoner, "Events and Conditions", Record of the Pro,iect MAC Conference on Concurrent S~stems and Parallel Computa-tion, ACM9 N.Y. 1970 pp3-52 ~59~
P.McJones, A Church-Rosser Property of Closed Appli-cative Lan~ua~es, IBM Research Report RJ1589, Yorktown Heights, N.Y. May 1975 S.S~Patil and J.B.Dennis, "The Description and Realiza-tion of Digital Systems, Revue Fra caise d'Automatique Informatique et de Recherche operationelle, pp.55-69, February,1973.
C.A.Petri, ~ Schriften des Rheinisch-Westfalischen Institu~s for Instrumentelle -Mathematik an der ~niversitate Bonn, Nr.2 Bonn,1962.
K.J.Thurber, Lar~e Scale Computer Architecture--Parallel and Associative Processors, Hayden Book Co. Rochelle Park N.J. 1976 It is an object of the present invention to directly execute a predetermined class of applicative languages in an information handling system whichincludes a plurality of interconnected cells each containing at least~lone processor.
It is another object of the present invention to directly and efficiently execute a predetermined class of applica-'tive languages in an information handling system including a plurality of interconnected cells each containing at least one processor.
It is another object of the present invention to directly and efficiently execute a predetermined class of applicative languages by providing an unbounded degree of parallelism in the execution of any user program.
It is yet another object of the present invention to directly and efficiently execute a predetermined class of applicative languages by providing for the simultaneous execution of a plurality of user programs.
It is yet another object of the present invention to directly and efficiently execute a predetermined class of applicative languages in an information handling system that ~ ~5~
has a cellular structure and whose size may be extended without limit.
It is yet another object of the present invention to directly and efficiently execute a predetermined class of applicative la.nguages in an information handling system including a plurality of interconnected cells, each containing at least one processor and logic means connecting the processors to form disjoint assemblies of the processors, the logic means being responsive to syntactic markers to partition the plurality of interconnected cells into seperate disjoint assemblies of pro-cessors in which subexpressions can be evaluated.
It is yet another object of the present invention to directly and efficiently execute a predetermined class of applicative languages in an information handling system inclu-ding a tree network of interconnected cells, each containing and an array cellsat least one processor~wherein each cell in the array is con-nected to one of the cells in the tree network, logic means for connecting the processors to form disjoint assemblies of the processors the logic means being responsive to syntactic markers to partition the plurality of interconnected cells into seperate disjoint assemblies of processors in which subexpressions can be evaluated and input/output means for entering applicative expressions into the cells and for removing results from the cells after evaluation of the applicative expressions~
It is yet another object of the present invention to execute a predetermined class of applicative languages in an information handling system as above wherein each of the cells in the array of cells are connected to one or more other cells in the array.
Accordingly, an information handling system for parallel evaluation of applicative expressions formed from groups of subexpressions, said groups of subexpressions being seperated ~ 1591~1 by syntacti.c Inarkers, includes a plurality of interconnected ce].ls each containing at ]east one processor, the cells being ~ivided lnto a tree s-truc-ture of :interconnected cells and a group of cells, each cell of which is connected to one of the cells in the tree network, ]ogic means for connecting -the processors within the cells to form disjoint assemblies of the processors, the logic means being responsive -to the syntac-tic markers to partition the plurality of interconnected cells into seperate disjoint assemblies of the processors in which subexpressions can be evaluated, and input/output means for en-tering appli-cative expressions into the cells and for removing results from the cells after evaluations of the applicative expressions.
The invention also includes a method for parallel evaluation of applicative expressions formed from groups of subexpressions in a network of processors wherein the groups of subexpressions are seperated by syntactic markers, the method including the steps of entering informa-tion into a first group of cells, each cell being adapted to contain a-t least one symbol of an applicative expression; partitioning the net-work of processors into one or more disjoint assemb]ies of processors under control of information contained in the first group of cells; executing an executable subexpression in each disjoint assembly containing an executable subexpression;
determining whether further executable subexpressions reside in the first group of cells; repeating the partitioning step in accordance with information contained in the first group of cells after the executing step if further executable subexpres-sions reside in the first group: of cells and rem~ing results from the first group of cells.
It is a primary feature of the present invention that an information handling system architecture capable of directly executing a predetermined class of applicative languages in which parallel processing capabilities of a number of cellular ~ ~ 5 ~
processors rnay be utiliYe(l to maximum advantage to improve system throughput.
The ne-twork of processorsaccording to the present inven-tion dlrectly interprets a high level applicative language and efficiently executes problems in the high level language accomo-dating unbounded parallelism permitted by applicative languages in any single user program.
It is another feature of the present invention that an information handling system according to the present invention accomodates unbounded parallelism which permits execution of many user programs simutaneously.
It is a primary advantage of the present invention, execution of one or more user programs is accomplished simultan-eously includlng the simultaneous reduction of a number of ex-pressions into subexpressions which are simultaneously evaluated, thus greatly enhancing the speed of execution and ~hieving significant increases in throughput of an information handling system according to the present invention.
It is another advantage of the present invention that the cellular nature of the processor results in lower design cost of cells, lower cost of replacement, reduced interconnection costs and most significantly, reduction in the cost of pro-gramming the network.
These and other objects, features and advantages of the present invention will become apparent by reference to the following drawings and detailed description of a preferred embodiment of the invention.
Figure 1 is a schematic representation of an ex-pression in tree form according to an embodi~ of the present invention;
Figure 2 is an schematic representation of an ex-pres~ion in internal representation according to an embodiment of the present invention;

~r D
,~(.~

Fig~lre 3a is a sc~hematic representation of a reduction showing an example of a primitive operator according to the present invention shown in tree form before a reduction;
Figure 3b ls a schematic representation of a reduction showing an example of a primitive operator according to the present invention shown in tree form after a reduction;
Figure ~a is a schematic representation of a reduction showing an example of a primitive operator according to the present invention in in-ternal representation before a reduction;
Figure ~b is a schematic representation of a reduction showing an example of a primitive operator according to the present invention in inter-nal representation after a reduction;

Figure 5a is a schematic representation of a reduction showing a second example of a primitive operator according to the present invention in internal representation before a reduction;
Figure 5b isa schematic representation of a reduction showing a second example of a primitive operator according to the present invention in internal representation after a reduction;
Figure 6a is a schematic representation of a reduction showing a third example of a primitive operator according to the present invantion shown in tree representation before a reduction;
Figure 6b is a schematic representation of a reduction showing a third example of a primitive oper-ator according to the present invention shown in tree representation after a reduction;

1 ~ ~9~

Figure 7a is a schematic representation of a reduction showing a third exampla of a primitive oper-ator according to the present invention shown in internal represantation before a reduction;
Figure 7b i5 a schematic :representation of a reduction showing a third example of a primitive operator according to the present invention shown in intarnal representation after a reduction;

Figure 8 is a schematic representation of an inter-connection of cellular processors according to the present invention;
Figure 9 is a schematic representation of se~J~ralexamples of partitioning a T cell according to the present invention;
Figure 10 is a network of processors according to the present invention partitioned into six sepe-rate areas;
Figure 11 is a state changed diagram representing state changes in a processor according to the present invention containing a single area with eight L cells;
Figure 12 ;s a schematic representation of a network of processors according to the present in-vention at an instant of time during a down-ward cycle;
Figure 13 is a state diagram of a node of an area;
Figure 14 is a schematic block diagram representation of an L cell;
Figure 15 is a schematic block diagram representation of a T cell, shown with Figure 9. Figure 15a is a processor block diagram;

Figure :16 is a schema.t-ic represen-tation of th.e pro-cess:ing -to be performed in L cells to execute the reduc-tion shown in Figures 3 and 4 above;
Figure 17 shows: a microprogram for execution of the primitive operator AL.;
Figure 18 shows a microprogram for the execution of the primitive operator AND;
Figure 19 shows a microprogram for execution of the primitive operator AA;
Figure 20 a representation of the contents of registers in a L cell for a sample microinstruction;
Figure 21 is a flow chart showing how an operator expression is to be interpreted according to the present invention;
Figure 22 shows several examples of partitioning of T
cell with accordance with upward flow of information from other cells ei~h~ T or L
in a tree structure of processors according to the present invention;
Figure 23 shows an algorithm in the flow chart form for computing a characterization eode W which characterizes a node of an active area, and appears with Figure 21;
Figure 24 shows a tree structure of processors according to the present invention in which meansfor locating the right end of an RA or trans-forming an area into an active area;
Figure 25 shows an algorithm in the flow chart form for computing the directory;
Figure 26 is a schematic representation of a process of distributing segments of microprograms inside an active area.

~ 1 ~9 ~

Figlire 27 is a tree representa-tion of identifying symbo]s which identify nodes of an RA at the -top levels of an active area.
Figure 28 is a f]ow chart representation of an algo-r;thm according to the present invention for computing an index (IN) in a node of an active area;
Figure 29 shows an example of the computation of specification for storage management accor-o ding to the present invention;
Figure 30 is a chart showing the contents of a number of cells in L during storage management at seven instants of time;
Figure 31 is a schematic representation of a portion of a processor according to the present invention showing input and output channels in the form of a binary tree.
Before describing how the information handling system operates according to the present invention, it's function will be explained. That, however, can be done only after describing how the syntactic and semantic aspects of applicative languages are represented in the processor, because this representation determines, to a large extent, the capabilities the processor must have in order to act as an interpreter for applicative languages.
1. Representation of Syntactic Aspects.
Expressions will be represented as linear strings of symbols derived from the representation described in Section 1.
B. as follows: since any well-formed expression of an applic-ative language is obtained by nesting sequences and applications in each other, we can associate a-nestin~ level with each symbol of an expression; we shall store the nesting level with each symbol and them eliminate all closing brackets --) and-- from the source text for they have become superflous.
This representation correspondonds to the symbol sequence obtained ;~
~i~ 14 1 ~ lJ 9 ~

~)y a preor(lcl -tra.vers;ll o~ the natural tree representation of the orig;nal text, in which the root of each nontrivial su~ -tree i.s labe1].ed with ( or C.
The interna] representation of -the processor is finally obtained by placing pair~ of symbols, each consisting of a program tex-t symbo1. and it~. nesting level, into a linear array of identical hardware cells, one pair per cell, preserving the o-cder of the program tex-t symbols from left to right. (see Figure 1 and 2 ) As a result of placing at most one program text symbol into a cell, the need for explicit separators be-tween symbols vanishes, because now the cell boundaries perform this function. We shal.l always assume that there is a suffici-ent number of cells available to hole our symbols; if there are more than the required number of cells, then some of them will be left empty. From the point of view of the representation, the number and location of these empty cells relative to the symbols of the program text is of no consequence; the result of the reduction is not influenced by the empty cells, although the time neededtocomplete the reduction might be.
As an example, consider one of the intermediate expres-sions that appeared in the example given above;
~ + ~ AA,~), ~TR, ((1,2,3,4), (11,12,13,14))~-~ .
The tree representation of this expression is shown in Figure 1 which also shows the (nesting) level number of every symbol.
Figure 2 shows the internal representation of the same expres-sion.

2 Representation of the Semantic Aspects of Inner Most Application.
The semantics of an applicative language is determined by a set of rules prescribing how all the possible reductions should be performed. These include rules specifying the effect of each primitive operator, and rules to decompose composite (regular and meta) operatOrs.

1 .~

By exa~ining what forms these rules take when using our chosen internal represen-tat-ion for expressions, we will be a~le to see w}-lat kinds of computations the processor will have to be able to perform.
a. Description of Primitive Operators.
An applicative language may have a large number of primitives, but the computational requirements of all of them can be classified into three easily distinguishable categories.
They will be explained with the help of the following operators:
AL (Apply to Left element) AL is a meta opera~or, and its effect is the following;
~(AL,f), (yl, y2..... , ~ yl,~y2..,, ~
and if the operand is not in the required form, the result is undefined. Here the u~de~;linea symbols are metalinguistic variables, and they stand for arbitrary constant expressions, i.e. expressions containing no applications. The arrow ~ is used to denote that reducing the expression on the left yields the expression on the right.
Figure 3 shows the effect of (AL, (HEAD, TAIL)) on a particular three-element sequence using tree representation.
As the definition prescribes it, (HEAD, TAIL) is applied to the leftmost element of the sequence, the rest of the sequence is left alone.
Figure 4 depicts the same reduction using the internal representation, again showing the program text first before, then after the reduction. Examination of the cells reveals that one of the following things has happened to each one of the symbol-nesting level pairs; the symbol only is rewritten (as in cell 2), the level number only is rewritten (as in cell 16), both symbol and level number are rewritten (as in cell 5), or no change ( as in cells 1 and ~).

The processing requirement of this primitive will be called type A requirements, and they are characterized by the following; (1) the resul-t expression can be produced in -the cells that held the orig;rlal reduclble application, i.e., there is no need for additiona~L cells, (2) the processing activities that are to be applied to any symbol of the original RA are known before execution begins, hence if prescription for these activities is p]aced into the cel~Ls before executlon begins, they can be performed independently of each other, in any order (possibly simultaneously), and there is no need for an~ communi-cation between these cells durin~ execution (this conclusion is independent of the expressions which replace the metalinguistic variables of the definition, because -the expressions f and yl are left intact, and the only change to expressions y2 through yn is their being moved up the tree by one level) AND
AND is a regular operator, and its effect is defined as follows;
~ AND, (x, y)~ ~, where x and ~ are expected to be atomic symbols, either T (true) or F (f~als~). If both x and y have the value T, then the result z is T; if both of them have Boolean values, but at least one of them is F, then the result is F, and in every other case the result is undefined.
Figure 5 shows an example of the application of AND to (T,F) in internal representation. (because of the simplicity of this example, we skip the tree representation.) Although the processing requirements of this primitive include some type A
processing (for example, the symbol AND and its level number can be erased irrespective of the the operand expression), there are some new elements also. They are included in the followin~, which we call ~pe B requirements: (1) the result expression can be produced in the cells that held the original reducible application, i.e. there is no need for aaditional cells, (2) at least some of the processing activities are data dependent, `~,?, ;i~' ,~

9 ~ ~s ~

an(l as a con eqllence, thele is a need f`or communication at least amon~ some of the cells durin~ execution, and also there are certain timing constraints. (In our example, the two components of the operand de-ternline whether to produce F` or T as the result, and this result cannot be produced in cell 2 before bringing together the contents of cells8 and 11).
AA (Apply to All elemen-ts) AA is a meta operator, and its effec-t is the following;
~(AA,f), (yl, y2,..., - ~ ~ yl~ ~ f~ y2 >~

and if the operand is not in the required form, the result is undefined~
Figure 6 shows the effect of (AA*) on a particular four-element sequence using tree representation.
The same reduction is depicted in Figure 7 using internal representation.
The processing requirements of this primitive are called type C requirements, and they are characterized by the following property; the result expression does not fit into the cells that held the original reducible application, hence there is a need for additional cells.
Since the number of insertions, and the length of the expressions to be inserted are not generally known before execution begins, a complex rearrangement of the whole RA may be necessary, the details of which must be worked out at run-~time. (For example, with AA the number of insertions is n-l, where n is the length of the operand. In Figure 7 there are three insertions, and each insertion contains the symbolsCand*,) Making room for expressions to be inserted (called storage managements, see ~ction B.6.f.) is possible only if the length of any expression to be inserted is known at all the places of insertion (indicated with arrows in Figure 7) b. I!efined Operators.
Whenever an atomic symbol for whlch a definition exists gets in-to the operator position of a reducible application, it must be replaced by its definition. Since a nontrivial definition contalns more than one symbol, replacing a defined symbol by its definition has type C processing requirements.
It should be noted tha-t definitions must exist before execution begins and cannot be created at runtime.
c. Composite Operators When the operator of a reducible application is composite ~i.e., a sequence), the way in which the evaluation proceeds depends on whether the first element of the sequence is regular or meta. If the first element of the sequence is an atomic symbol, then whether it is regular or meta is part of its definition. If the first element of the sequence is a sequence, then it is meta if its first element is ~, and other-wise it is regular.
1. Regular Composition If the first element of a composite operator is regular~
we decompose it with -the help of the following rule, called re~ular composition;
~ (cl, c2, ..., cn) , d~ ~7 ~ c1~(c2, c3,..-,cn)~7 The above reduction rule reveals that the processing require-ments of regular composition can be characterized as type c, since we have to create two new symbols between cl and c2.
2. Meta Composition If the first element of a composite operator is meta, it is decomposed with the help of the following rule, called meta composition;
~ (cl, c2,... , cn), d ~ ~ ~ cl. ((cl, c2 .... ,cn), d)~
Since cl (whatever expression it is) must be duplicated, the processing requirementsare to be classified as type C. (In fact, if cl happens to be a primitive meta operator, there is ~ 31 no need to go through th;s s-tep; this is demons-trated in Figures 3 and 6 with the rneta operators AL and AA)

3. Order of Evaluation The definition of reduc-tion languages allows the reduc-tion of innermost applications to take place in any order, owing to the so-ca~Lled Church-Rosser property of -these languages.
Since innermost applica-tions are disjoint in our chosen internal representation, this representation allows all innermost applications to be reduced concurrently. This is because the outcome of the reduction is determined solely by the operator and operand expressions, and nothing from the rest of the program text can influence it.
~. Locating Innermost Applications Before processing of innermost applications can take place, they must be located in -the program text. This process is somewhat complicated by the fact that there is no bound on either the number of innermost applications that may exist simultaneously in the program text, or on the length of an innermost application.
An application symbol whose level number is i is the left end of an innermost application if (1) it is the rightmost application symbol, or (2) the next application symbol to its righ-t has a level number less than or equal to i, or (3) there exists a symbol with level number less than or equal to i between the application symbol and the next application symbol to its right.
If an application symbol with level number i is known to be the left end of an innermost application, then the entire application consists of the application symbol itseLf and the 3 sequence of contiguous symbols to its right whose level numbers are greater than i.

B. De~-,crit)tiorl of Network Processors l. ~nterconnect;on Pattern of Cells An archi-tectllre will now be described in conjunction with the dra~ling which will include description of an apparatus ~nd a method for evaluating program expressions in app]icative languages.
Referring now to Figure 8, a block dlagram of a network of cellular processors according to the present invention is shown. Figure 8 shows that the processor lO00 is a cellular network containing two kinds of cells interconnected in a highly regular manner. Cells 1400 form a linear array (they are indicated by rectangles in the diagram), and they normally hold the program text as described in Section lV. A. l. Cells 1500 form a full binary tree (they are indicated by triangles in the diagram), and they perform processing functions, act as a routing network, etc. The linear array of cells 1400 will be referredto as L, and the tree network as T. Throughout the root cell of T is assumed to act as the 1/0 port of the pro-cessor (See Sec-tion B. 7.) However, it must be understood that the input/output may be performed directly on the cells 1400 of L by a parallel transfer which transfers into each cell 1400 of L, the appropriate symbol of the program text.
Since L holds the program text one symbol per cell, a network of practical size will comprise a large number of cells.
Because of this, we note here an important and very attractive property such networks have; the total number of cells in the network is a linear function of the length of L. More pre~
cisely, if n is the height of the tree of cells, then the length of L is 2*~n, and the number of cells in T is 2**n-l, sg the total amount of hardware is almost exactly 2*~n(1+~), where l and t represent the amounts of hardware built into a single cell of L and T, respectively.

, .~ 21 .~, 2. The Part;tioned Processor A prerequisite of slmultarleously reducing all reducible applications hereinafter (RA's) is to guarantee tha-t different RA's are not able to interfere wil;h each other. One way to achieve this is to make sure that different RAIs are processed in disjoint portions of T. (These disjoint portions of T will comprise disjoint assemblies of processors and interconnections as described below.) We wi11 describe a processing mechanism in which each RA "sees" a binary tree above it, which processes the RA as if i-t were alone in the processor. The scheme we describe re-quires that some cells of T serve more than one RA simultaneously .
One way to accomplish this is to make it possible for the hard-ware of the cells of T to be divided into parts ( in-away de-termined by contents of L), so that when a cell of T serves more than one RA, each RA uses a different part. The process and the result of separating a cell of T into parts will be called the partitionin~ of that cell. The process and the result of partitioning all the cells of T is called a parti-tioning of the processor.
In this section we describe, with the help of a symbolic notation, what the partitioned processor is like. Later, in Section B. 6. a., we present some of the details of the process of partitioning.
At the core of the partitioning process is the execution of the algorithm of Section A. 4., done simultaneously for all applications. There are two steps in this process;
(1) locating left ends of applications (as described in Section A.4.) and subsequently dividing the processor into so-ca]led areas;
(2) locating right ends of' innermost applications (as described in Section A. ~.), and subsequently transforming some of the areas into so-called active areas.

$ 1 First we describe the processor as partitioned into areas. Assume that we start out with a representation of the processor as shown in Figure 8. We modify it by erasing all connections shown in Figure 8, and place symbols of the re-duction language program into cells 1400 of L. In this symbolic notation an area will be a binary tree whose leaves are in cells 1400 of L (there is exactly one such leaf in each cell of 1400L whether that cell 1400 holds a symbol of the reduction program or not), and whose nodes which are not leaves are in cells 1500 ~ T. There will be a distinct area associated with the leftmost cell 1~00 of L and each ~symbol not in the leftmost cell 1400 of L
The following terminology will be used. The index of a cell 1400 of L is an integer indicating its position in L
from left to right. Let i(l) =1, and let i (2),...., 1 (q) be the indices of cells 1400 of L (other than the leftmost one) holding the symbol ~, with i(m)~ i(n), whenever m ~n.
Depending on the value of j, the jth area is the small-est tree such that for lcjcq (1) the leaves of the tree in the cells of L indexed form i(j) to i (j+l)-l.
(2) the top node (root) of the tree is in the lowest node of T which has both i(j)-l and i(j+l) cells of L as descendants, for j=l (1) the leaves of the tree are in the cells of L indexed from i(l) to i(2)-1, (2) the top node (root) of the tree is in the root of T, for j=q (1) the leaves of the tree are in the cells of L with indices greater than or equal to i(q), (2) the top node(root) of the tree is in the root of T.

~ l s ~
Figure 10 shows an example of a partitioned processor containing six areas. Seven of the eight possible partitioning patterns shown in Figrure 9 appear in this examp]e. The elements of the symbolic notation we are using can be interpreted as follows. All branches of areas correspond to communication channels of identical capabillties capable of carrying infor-mation bo-th ways simu:Ltaneously. Whenever only one branch is shown between two cells, we may assume that the second channel is idle. Each node of an area corresponds to some fixed amount of processing hardware. Whenever a node of an area has two downward branches, the corresponding node hardware may perform processing that is immediately comprehensible in terms of the reduction language program. (Note that a cell of T may hold at most one node with twodownward branches.) For example, in Figure 10 9 the node with two downward branches in cell 105 will multiply the symbols 1 and 11 in the program text, 2 and 12 will be multiplied in celll2~3 and 13 will be multiplied in ce11127, and 4 and 14 will be multiplied in cell ~5 Other functions of such nodes, and the role of nodes with one down-ward branch will be described later. The top cell 101 of the area will serve as it l/Oport; the 1/0 channels with which it connects are not considered here, but will be taken up in Se^tiOn B. 7.
Finally we note that since the root of an area is in a cell 1500 of T which has amon~ its descendents the cell of L
holding the next C symbol on the right (if one exists), all the necessary information can be made available at the root node of each area to determine whether or not the area contains an RA.
Once the processor is divided up into areas, the algor-ithm to locate RA's is executed at the roo-t of each area, Since the leaves of an area holding an RA contain all cells ~,9~5-1 of I, up to the next - symbol or to the right of L, some of the rightlnost leaves of thl~; area may hold symbols of the reduction language text tha-t are ou-tside t;he RA. Locating such leaves, and separating them from the area--thereby transforming the area into an active area--is the second part of the partitioning process. (the active area is obtained by cutting off certain subtrees of the original area. As a result, the active area is a binary tree, too.) In our example in Figure 10, there are four RA~s,butnone of them requires this process.
It should be pointed out tha-t partitioning the T cells (a spatial separation) could be replaced by time multiplexing ( a temporal separation), leading to a possibly slower, but cheaper implementation of the same system.
3 Organization, States and State Changes In a global sense the operation of the processor is determined by the applicative language program placed into L 200. (See Figure 10) Since there are no functions dedicated to cells of T and L, the operation of each cell will be "data-driven", i.e. in response to information received from its neighbours. The activities of cells will be coordinated by endowing each cell of the processor with copies of the same finit~-state control, which determine how the cell interprets information received from its neighbours. Whenever a cell of T getspartitioned, each independent part (there are at most four of them, each corresponding to a distinct node of an area) must have its own fi~'t'e~tate control. On the other hand, a cell of L needs only one such control, since it never gets partitioned.
The state of a node of an area will change whenever either its parent or both its children change state. The actual pattern of state changes will be the following: the root cell of T will initiate the pro-cess by the nodes contained in it changing their states; as a result, its son cells in T

'~

~ ~ ~9 1 ~ ~
wil1 change state, ~lrld these changes, in turn, initiate similar changes on the next lower ]evel of T. When this wave of changes reaches L, the ~-tate changes in the cells of L initiate changes ln the bottom level of T, which, in turn, cause changes in the nex-t higher leve~ of T, etc~
Figure 2 shows a representation of the state changes for a processor in which I. has eight cells, partitioning pro-duced a single area, and hence all the cells go through the same state changes. All conditions (represented by circles) have the following interpretation: "in the given cell state change p ~ q is -taking place." The distribu-tion of tokens in the net (showing the holding of certain conditions)illustrates that these state changes can take place at their own pace, but they always get synchronized in the process of approaching the root cell of T.
In order to simplify our presentation, we shall assume that the state changes take place simultaneou~y on any level of T. This will allow us to talk about upward and downward cycles, indicating which way the state changes are propagating.
Figure 12 shows a fragment 100 of a processor in the middle of a downward cycle. The cells are partitioned, and four differ-ent states--3,33,4, and 34--can be found in the diagram. The diagram illustrates the following point; since the reason for having the finite-state control in the cells is to coordinate related activities in the processor, all nodes of an area go through the same state changes (e.g. in Figure 12 all nodes of the active area on the left are going through the change 3 ~~4 whereas all nodes of the active area on the right are going through 33~~34), and as a result in general no useful purpose is served by talking about the state of the processor as a whole.
Figure 13 shows the state diagram of the nodes of areas, (and also that of cells of L) for the processor we are describing. Here only the following observations are made;
(1) although different cells (or parts thereof) may be in differGnt-~ st~t~t~ 1~ any mo~ent, We can ~31way5 say (thanks to our simplifying assumption made above) that all cells (or parts thereor) on the same level of T are in states tha-t are in the same column of the s'~te diagram (for an example see Figure 12):
(2) we shall use the expression k+i to denote all states in a column of the state diagram, where i is the smallest label in the column and k may assume values from 0,10,30, and 50, e.g., k+4 where k=0, 30, 50, k-~7 where k=0, 10,50, or k+l2 where k=0,10,30,50; (3) odd numbered s-tates are entered in IG upward cycles and even numbered states are entered in downward cycles; (4) there are three (specially marked) states in the diagram--states 5, 19, and 61--wi-th more than one successor state; in these states the successors are always chosen deterministically, with the help of conditions that are not visible on the level of the state diagram; (5) in cells of L, the state belongs to the contents of the cell, not to the cell itself--hence the state information moves with the contents of the cell during storage management (See Section B~6 f.2);

(6) since the state diagram describes the states of the nodes of areas, during state transition k~l4-~k+1, when partitioning takes place and hence areasgo out of and come into exist-ence, some additional rules are needed; in states k+l4 each node of each area changes its state to the ~ndefined state, with the exception of the leftmost cell of L and the cells of L
holding an c symbol, and these cells of L in the process of repartitioning the processor, determine the states of the newly formed areas (the state transitions to and from the undefined state are not shown in the state diagram of Figure 13);
(7) when the program text is first placed into L, the state of ~O each symbol in it 1; (8) the state diagram is cyclic; the successors to states k+l4 are states k+l.

The state diagram specifies the overall organization of the processor. The organization, and hence the state diagram, chosen for description is just one of many possible 9 Y~ 7 .L
alt.e-rncltives the ma;n cri-teria in its selection were tha-t it be ea~y to descri~e, and still able to illustrate well the advantages of a processor of this kind.
Since it is the sta-te of the node of an area which determines what processing activities that node performs, and the states of the nodes of an area are closely coordinated (all nodes of the area go through the same sta-te change in each upward and downward cycle), the processing activities performed by an area in certain states or certain groups of states) can be classified as fitting one of several global patterns. We choose to distinguish three such patterns, and call them nodes of o~eration.
In modes 1 and 11, information is sent along paths between L and root cell of T usually inside areas, but possibly also across area boundaries (examples of the latter are partitioning, preparation for storage management, and detect-ing the end of storage management). In mode 111, information is sent only between cells 1400 in L.
Modss 1 and 11 are distinguished because they treat information items moving upward differently. In a mode 1 operation, (1) whenever a node of an area (or a cell of T) re-ceives two information items from below, it produces one information item to be sent up to its parent node; (2) the out-put item is produced by performing some kind of operation on the two input items, such as adding two numbers (see combining messages, Section B.6.d.),taking the leftmost three of six arriving items (see partitioning Section B.6.a.), or the considerably more complex operation of preparing the directory (see Section B.6.b.); (3) since each subtree of the area (or 3~ of T) produces a single value, this value can be stored in the root node of a subtree and can be used to influence a later phase of processing; (L) if the data paths are wide enough, the node of the area (or of T) is able to receive both its inputs in one step, hence able to produce its output in one step.

9~1 In a mode 1] opelatlon, (~1) whenever a node of an area receives two informatioll i-tems from be]ow, lt produces two information items to be sent up to lts parent node; (2) the two output items are the same as the two 1nput items, and the order ln which they appear on the output may or may not not be of consequence; (~) the higher up a node is in the area, the larger the number of lnformatlon items that will pass through it, and as a resu]t, the time required for a mode 11 operation is data dependent. Because of this queuing phenomenon, and because the size of information items may vary considerably, the natural way to con-trol a mode 11 operation is with the help of asynchronous control structures, via ready and acknowledge signals. The mode 11 operations are; (1) bringing in microprograms; (2) sending messages, data movement, and 1/0; (3) preparation for storage management (in states k+q and k+10).
Modes 1 and 11 also differ in the ways they treat infor-mation items moving downward, but these differences are conse-quences of the primary distinction between them. In a mode 1 operation, (1) the node of the area (or of T) produces both a left and right output item in response to the one input item, and they may be different, depending on what was left in the node by the previous upward cycle (an example of this is the process of marking expressions--see Section B.6.c.); (2) since during the previous upward cycle the top of the area produced a single item, during the next downward cycle only a single item (not necessarily the same) will be arri-ving at each cell of L in the area; (3) the processes of moving information up and down in the area do not overlap in time.
In a mode 11 operation, (1) the node of the area pro-duces two output items in response to the one input item, and they are always identical (the item is being broadcast to each cell of L in the area); (2~ every item that passes through the top node of the area is broadcast separately to cellsin L, and L 5 ~
the lat-ter are ~free to ~ccept or reject any of the i-tems arriving at them; (3) the procrsses of moving lnformation up and down in the area overlap in time and hence each branch of the area must be ab]e to carry informatlon l-tems in both directions si-multaneously.
In summary,lnodes 1 and 11 can be compared and contras-ted as follows. In a mode L operation, by propagating inform-ation upwards ln the tree slmultaneously from all cellsof L, the global situatlon in I, is evaluated and the partial results of ¦O this evaluation are stored lnto the nodes. ~ext, by propagating lnformation downwards in the tree and by using the partial re-sults stored ln the nodes, each cell of L can be influe~ced separately and differently. In a mode 11 operation on the other hand, the area functions as routing, or interconnection, net-work, and typically delivers information items from L back to L
by passing them through the root node of the area.
Mode 111 is characterized by the fact that only cells of L participate in the processing, and adjacent cells of L
communicate with each other directly. The only mode 111 æ~ operation is storage management (see Section B.6.f.2)

4 Cell Organization In this section we outline t~e processing and storage capabilities that a typical cell of L and T must have. When describing certain components of these cells, we shall often refer to details that are explained only in subsequent parts of Section B. Hence, this section can be fully understood only after reading the rest of Section B.
a. L Cells Figure 14 shows a cell 1~00 of L. The names of 3~ registers appearing inFigure ~ are used in the rest of Section B to explain how the processor operates.
The state control 1~02 has the ability to store the current state, and compute the next state corresponding to the state diagram of Figure 13; this state information belongs 3o -r l ~ ~r3 ~ ~ ~

to the contents of the cell not to the cell itself.
CPU 1404 has the abili-ty -to execute segments of micro-programs, and perform processing rela-ted to storage manage-ment (see Section B.6.f.), which is not explicityly specified in microprograms.
Microprogram store is capable of s-toring a certain number of microinstructions. This is necessary because cer-tain microinstructions cannot be executed immediately upon receiving them, since some of their operands are not yet available.
Condition registers 1408 store status information about the contents of the cell; for example, whether the cell is full or empty, and whether the contents of the cell are to move during data movement.
Local Stora~e 1410 contains the following registers;
S1412 holds a single symbol of the reduction language text.
S1414 holds another symbol of the reduction language, with which S141 ~st~l~erewritten at the end of processing the RA.
ALN 1416 holds the absolute level number of the symbol of the program text; this is obtained by considering the cont-tents of L as a single expression of the reduction language and assigning to ALN the nesting level of the symbol in question.
RLN register 1418 stores the relative level number of a symbol in an RA. This is obtained by assigning to RLN the nesting level of the symbol with respect t'othe RA.
RLN 1420 stores the value with which RLN 1418 is to be rewritten at the end of processing the RA.
Marker 1 1422, and marker 2 1424, are set by the microprogram and used to mark all symbols of an expression.

Whenever a microinstruction "mark with x" is executed in a cell of` I., ~IARI~iR 1 1~22, receives the value of "x", and if Rl,N 1418 has a certaln value, MARKER 2 1424 receives "x", too.
Symbols of the marked expression are indexed, beginning with one and these index values are placed in Nl ]426. The largest index value, which is the total number of symbols in the ex-pression, is placed in Ll 1428. When a symbol occurs in a marked expression, the contents of POS#1430 (mnemonic for position number) for each symbol is set as follows; the marked expression is considered a sequence, and all symbols of 'r-~ the expression which is the ith element of this sequence receive the value i in their POS# register 1430. The largest value of POS#, which is the length of this sequence, is placed in register LENGTH 1432 of each symbol. Also, each expression which is an element of the sequence is indexed separately, the index values are placed in N2 1434, and the total number of symbols in the element expression are placed in L2 1436. (N2 and L2 play the same role for the element expression as Nl and Ll do for the whole marked expression.) The L/R register 1438 holds the v~ue "left" ( or "right") if the symbol contained C in A 1412 is the left most (or rightmost) symbol of one of the elements of a marked sequence.
Ml 1440, M2 1442, M3 1444, and M 4 1446 are called message registers: send statements generate messages t~at may have one, two, three, or four components, and on arrival at the cell 1400 of L, they are placed in Ml 1440, M2 1442, M3 1444,And M4 1446 respectively.
BL 1448 and BR 1450 containnon-negative integers, and are used during storage management; the cell 1400 of L in question will be entered on the left by the contents of BL 1448 cells of L, and on the right by the contents of BR 1450 cells of L.
Of these registers, BL 1448, BR 1450, and state control 1402 are used in every cell 1440 of L; S 1412 and ALN 1416 are used in every occupied cell of L; and all the other 1159~
registers are used only by occupied cells internal to an RA
or by cells reserved during storage management. State con-trol lL02 controls tht flow of information between the buffer registers 1460, 1470 and 1480 which act to buffer information being transmitted into or out of cell 1400 to local storage 1410, which consists of registers 1406 through lL5L inclusive and to CPU 140L. Buffer register lL70 buffers information being transmi-tted to and from the T cell 1500 associated with the L cell 1400 under consideration. Buffer register 1460 and 14~ transmit information to and from adjacent L
cells 1400 when there are interconnections between cells 1400 in the L array.
The size of the registers lL12 through 1450 in the local storage 1410 of each cell is determined by the size of S and ALN, and by the size of L because (1) RLN need not be any larger than ALN, (2) components of messages may be either symbols of the program text or relative level numbers, and (3) the number of cells in L is a bound on the values that BL 1448, BR 1450, Nl 1426, Ll 1428 POS# 1430, LENGTH 1432 N2 1434, and L2 1436 have to hold.
The following considerations are relevant to determining the size of S 1412; S must be able to hold (1) syntactic markers, such as (or ~, (2) symbols whose meanings are speci-fied by the language definition, such as T (true), F (false), 0, and numbers, (3) names of primitives of the reduction language (each has its effect defined by a microprogram), and (4) symbols to designate user-defined operations (for which a definition must be created before execution begins).
Register ALN 1416 must be able to hold the absolute 3o level number of any symbol of the program text, and consequently the size of ALN 1416 is in a simple relation with the size of the processor. Assume that T lQ0 is a full binary tree, and the height of the tree network is n, in which case the number of ce]ls in T. is 2;~"n. Assume further, as a worst ease, that L contains a single constant expression such that the rightmost cell 1400 of L holds an atomic symbol and all other cells of L hold a sequence symbol. The maximum nesting leve]
in this expression is 2;~n-1, and so ALN must have n bits to accommodate it.
b. T Ce]ls.
A cell 1500 of T 100 is shown schematically in Figure 15. The components of this diagram can be explained as follows:
(1) Rl 1501 through R6 1506 stand for identical groups of registers. These registers serve two functions, as input and output ports in the process of eommunieating with other cells 1500, 1400 of T and L and as local stora~e for Pl 1511 through P4 1514.
(2) Pl 1511, P2 1512, P3 1513, and P4 1514 are the processing components of the cell 1500; each one may belong to a different area of the processor 1000. All have identical proeessing capabilities, the same amount of loeàl storage, and identieal state eontrol units (similar to state eontrol 1402 in Figure 14). They must be able to perform the proeessing required by the internal meehanisms, deseribed at length in the remaining part of Seetion B.
Figure 15a shows a bloek diagram of eaeh proeessor T
Eaeh Pn ineludes a ALU 1562 and a mieroprogram eontrol unit 1564.
(3) The lines 1521-1526, 1532_1544 eonneeting the register groups 1501-1506 are the proeessing eomponents 1511-1514 represent eommunieation ehannels of identieal eapa-bilities, eapable of earrying information both wayssimultan-eously. (Channel~may be seleeted as required) Nota~ ehannelsare used all the -time: Figure 22 speeifies the eight possible partitioning patterns of the eell. The parititioning eontrol in Figure 15 determines whieh partitioning eonfiguration is assumed by the eell 1500 ~ 34 9 ~ ~ 1 (4) Information paths 1';51, 1552, 1553,1554, 1555, 1556 originating in Rl 1501 and R2 1502 in R3 1503 and R4 150~ and in R5 1505 and R6 1506 lead to the parent ncde,left son node, and right son node, respectively. Each line re-presents a communication channel capable of carrying informa-tion both ways simultaneously.
In the rest of Section B, the processing activities performed by cells 1500T are described with the help of tem-poraries. Again, we do not show an explicit mapping between these temporaries and the components of T, beca~se many different mappings are possible.

5. Microprogramming Language.
In this section we describe a simple language capable of specifying all the computational requirements outlined in Se~,ion A. Since it is closer in style to a conventional microprogramming language than to a machine language, we shall refer to it as a micropro~rammin~ làn~uage.
Type A processing which we described in Section A, can be performed in cells of L alone, and we choose to implement it by executing suitable microprograms in cells of L. Type B and C processing requirements are more complex, and we im-plement them by executing suitable microprograms in cells of L
which, in turn, can initiate processing activities in cells of T.
Figure 16 will help to introduce some terminology.
It shows the RA already discussed in the context of Figure 4, and it indicates (in plain English) the processing activities that must be performed to bring about the effect of the reduction rule in question.
The ~lity of processing activities required by an RA, and expressed in the microprogramming language, will be called a micropro~ram. A microprogram is made up of se~ments specifying processing required by single symbols (atomic symbols or syntactic markers) or well-formed expressions of the RA.

A segmen-t comprlset, n sequence of mlcroinstructions~
Microprograms specifylng the effects of operator ex-pressions reside outside the processor. When a reducible app-llcation ls loca-ted, the approprlate microprogram is sent in via the root cell 101 of T. Section B7 describes how it gets from the top of -the active area to cells of L holding the reduction ]anguage program text. If the RA is well-formed, every symbo:L of it receives a segment of a microprogram, and only such cells receive microinstructions.

a. Sample Microprograms We now introduce some of the details of the micro-programming language by means of three examples.
For ease of understanding and to avoid the need to specify low-level details of the internal representation that are irrelevant here, we have chosen a representation for the microinstructions with an ALGOL-like appearance.
Since segments of microinstructions apply to constit-uents of the operator and operand, and these constituents form blocks of a partition of the RA in L, we arrange the segments of the microprogram in a linear sequence so that the order of the segments will match the order of the correspon-ding constituents in L. Because of this simple positional correspondence, the only information that has to be attached to any segment of the microinstructions is the description of the constituent to which it applied (e.g. a single symbol with a given level number), and we shall call such a descr-iption a destination expression.
Our first example, shown in Figure 17, is a micro-program for the primitive meta operation AL.
Ths destination expressions of this example show the ways the microprogramming language deals with those aspects of the RA that become known only at runtime.
(1) The microprogram is written with the assumption that the level number of the ~symbol is O )whenever the RA
is located, these so-called relative level numbers--RLN for ~ i ~9 ~ ~ ~
hort--are com~)uted for each symbol by subtracting the true, or ~bsolu-te level number--~L,N--of the symbol from the ALN
of the symbol in question.
(2) The cies-tination expression (E/i) indicates that the same segment of microprogram is -to be sent -to all symbols of a well--formed expression whose leftmost symbol has RLN=i.
(The size of this expression will generally be unknown prior to execution. Section B.6. b. describes how all symbols of this expression are located.) The destination expression (S/i) indicates that the segment is to be sent -to a single symbol with RLN=i.
In the ~croinstructions, S, RLN, and POS# refer -to registers of the cell 1400 of L, executing the microinstruction in question (see B. 4. a.) The numeric labels in front of statements indicate the state of the cell of L in which the statement in question should be executed. Statements within a segment are executed in their order of occurrence. Some statements (e.g. erase) need no label bec-ause their time of execution is determined in some other way.
The phrase "mark with x" activates the only available mechanism to analyze a sequence into its components. As a result of executing this statement, the whole marked expres-sion is considered a sequence, and its component expressions receive a number in their POS# register indicating their position in the sequence, and their LENGTH registers receive a number indicating the number of elements of this sequence.
(The full effect of the mark statemen-t is explained below, whereas the process of marking an expresiion, which begins in state 5 and ends in state 8, is described in B.6.c.) With these comments the microprograms for AL should now be readable. It says: the leftmost symbol of the RA should have RLN=0, and this symbol should be changed to (. We need L
not verif~y ~ f~ hi ymboL is ~ , since the RA is located on the ba:is of ;-ts ~eing a ~. The nex-t symbol from left to rig~ht mus-t be ~wi-th ~IN=l;if i-t is, it should be changed to ~:
alternati~ely we s:ignal an error. The next symbol--whatever it is--should have Rl,N=2, anct :it should be erased. (Again, we know it is AL, since the microprogram wasbrought in on that basis).
The next expression, which is the parame-ter of the AL opera-tor, should be left alone (its leftmost symbol must have RLN=2).
Following tha-t it is the operand expression whose lef-tmost sym-l~ bol must have RLN=l. We erase this leftmost symbol if it is (, otherwi~cesignal an error. In addition all component expressions of the operand with the exception of the first will have their RLN reduced by one.
We introduce some further details of the microprogram-ming language by showing a microprogram for the primitive AND in Figure 18.
This example introduces what can be called the message mechanism, providing a means of communication among cells of L during execution, which is the chief requirement of type B
processing. A variety of send commands exists for the purpose of broadcas~ng information in active areas. A message sent by a send lC statement moves up the area simultaneously wi-th the state change 4 ~ 5. The command in our example has the form "sendlC (binary operator, operand)," which causes the operands to be combined according to the binary operator as they move up in T. Hence only one message (containing the result) reaches the top of the active area; that message is broadcast down to every cell of L in the active area. Any cell can pick up the result in its M2 register, but in our 30 example only one cell is programmed to do so, by means of the statement S:=M2(1).
A microprogram for AA, shown in Figure 19, will illustra-te how type C processing is specified.
This microprogram implements AA as shown in Figure 7;
the origina]ly existing copy of f is left in place, and l ~ ~9 1 ~ 1 ~!ecomes the operltor f X~ and n-l additional copies of'~/l and f/2 are created in fron-t of y2 through yn.
Th~ operand expression 1s marked with y; this causes the elements of the operand sequence to be indexed by setting the values of the POS~ registers. Thus if a symbol appears in the 1th elemen-t of the operand sequence, the POS# register of the cell which holds the symbol will be assigned the value of i. We insert ~/1 on the righ-t of yl through yn-l by writing "inserts" (right, c~, 1)," and insert ft2 on the left 1~ of y2 through yn by writing insertE (left, x,+O)." In the latter case x is the symbol with which we marked every symbol of the expression f, and since the information we are insert-ing comes from the source text and not from the microprogram we give an incremen-t (+O) to the original RLN instead of a new value for RLN.
The insert commands result in insertions adjacent only to the leftmost or rightmost cell of the expression to which they apply. Information to control where the insertion is made is in the L/R registers of an expression, placed there in the process of marking the expression. Consider, for example, the s-tatement "if POS~ c LENGTH then insertS (right ~,1)," This is received by every symbol of the operand.
The condition holds only in cells containing symbols of yl through yn-l. Moreover, we do not want to perform insertions next to each symbol of these expressions, only at the right end of their rightmost symbols. The command "insert~ (right, ~ ,1)" will be executed only in cells whose l,/Rregister con-tains the value "right".
b.Description of Microprogramming Language.
The microprogramming language we are to describe is capable of expressing the computation~ requirements of a large number of primitives. It has been used to write micro-programs for many primi-tives, including most of those con-sidered by prior art publications. Although the language has an ALGOL-like appearance, the simplicity of the constructs llows a very conc;se encocling into an internal representation.
~ ,egmerlt of ~l mlcroprogram is composed of a destin-ation expression, fol:Lowed by a sequence of labelled or un-labelled s-tatements. The permissable destination expression are S/i and E/l wi-th 0- i--3, implying that beyond relative level number three we cannot distribute different ~icro-instructions to different expressions (the reason for this res-triction will be explained in Section B.6.b.).
Every statement should be preceded by a numeric label, unless (1) is a mark, erase, or no-op statement, (2) it is one of the arms of a conditional, (3) it is a send state-ment other than sendl(...) or sendlC(...), or (4) it uses some of the message registers (Ml through M4). Any integer used to designate a state in the state diagram (Figure 13) can appear as a label of a statement.
The conditional has the form: if ~predicate7 then ~statement~ else- ~statement ~. Neither arm of a conditional may be another conditional, or a mark statement. The predi-cate is formed from relational expressions, with the help of Boolean operators, assuming certain reasonable length re-strictions. In a relational expression the usual relation operators (=,~,c~-, ~ , -) may compare constants, contents of any of the registers of the cell of L, or values of arith-metic expressions formed thereof (again assuming certain length restrictions).
In an assi~nment statement on the lefthand side one can write only S or RLN (all other registers of cells of L
are set only in s~cific contexts, e.g. by some of the other statements), whereas the righthand side one can write a con-stant the name of any register of the cell of L, or anarithmetic expression formed thereof assuming that certain length restrictions apply. When all quantities are available, the righthandside of the assignment statement is evaluated and stored in a temporary register (S' for RLN'); the time of evaluation should be indicated in the statement label, if 4o -` 13~91~1 po,sible. The assignmerlt i-tse]f, however, is executed only at the final stage of -the prt)cessing of the RA.
The erase statement clears all regis~ters of the cell .
of L at the end of the processlng of the RA.
The send s-tatement is used to send messages to the top of the area, from where -they are broadcast to all cells of L that are contalned in the area. Sending and processing of dlfferent messages can be overlapped in time if the rela-tive order is imma-terial. Sequencing is made possible by 0 ind~exing the messages and message with index i+l is sent only after all messages with index i have arrived at their destination. Indexing is done by using send statements of the form: sendi (...), where i=1,2,3, ... The parameters of the send statements shown above are the messages to be sent. The number of parameters is varying, but should not exceed some specified value. (We have needed at most four so far, so we choose the maximum to be four.) The messages sent by sendl (...), send2 (...), etc. will not interact with any other messages in T. On arrival back at L the para-meters of these send statements are placed into registers Ml, M2, M3, and M4, of each cell of L in the area, ready to be used by the microprogram. Since registers Ml through M4 accept every message arriving at the cell of L in question, whenever their names appear in an expression in the micro-program, that expression is evaluated for every message accepted. (Ml through M4 are used most frequently in conditionals, since usually some part of particular message is sought depending on some condition.) We can distinguish - between messages produced by sendl (... ), send2 (... ), etc by writing Ml(l), Ml(2), etc.
As an alternative, we may want the messages to be combined whenever they meet in some cell of T, such as adding them up, or selecting the largerone (see also the micro-program for AND above). Such send statements are written as ,J ~ 41 ~ 1~9~
se~-l(l 1C (...), ~en(l 2C (...), etc., and their first parameter i9 the o~)erator specif`ying the rule of conbination. Statements of the forms sendi and sendjC must have i~j. Moreover, for any value of i, only one operator can be used in statements of the form sendiC. The second,third, and fourth parameters of sendiC sta-tements are to be combined separately according to the specified operator, and their results end up on registers M2, M3, and M4.
All the s-tatements of the form sendl (...) or of the form sendlC (...) should be labelled, each with the same label, chiefly to indicate whether the results of the mark statement are needed to generate the messages. Other send statements, i.e., sendi (...) and sendiC (.~.) where i=~l should never e labelled.
can Any segment~contain only one mark statement, and such a statement cannot be either arm of a conditional. As a result, every cell of L receiving the mark statement will be marked, and furthermore only constituents of the source text that have their own destination expressions can be marked.
The full effect of the mark statement is explained with the help of Figure 20. Registers Nl and Ll make is possible, for example, to write microprograms to compare two arbitrary ex-pressions for equality, or to insert the whole marked expression somewhere else in the program. Registers POS# (position number) and LENGTH allow us to write microprograms to do different things to different elements of the marked sequence, and they combined with registers N2 and L2 will allow us to insert the component expression of this sequence at different places in the program. Finally register L/R is used to locate the left or right end of any of the component expressions, in order to be able to make an insertion there. (The process of assigning values to these registers is described in B.6.c.) The insert statement has three variants. InsertS
is used whenever a single symbol is to be inserted from the microprogram. Its form is insertS (left/right, symbol, RLN).

L
The first paralneter ~ecifies whether -the symbol is to be in-serted on the Lef-t or on the right end of the expression holding the insert s-t~-telnen-t in ques-tion. The second and third parameters are the symbols to be inserted and its RLN. InsertE
is used whenever an expression (possibly a single symbol) is to be inser-ted from the program text. It's form is insertE
(left/right, marker, increment to RLN). The first parameter is the same as in the case of insertS. The second parame-ter identi-fies the symbol or expression to be inserted which must have been marked. InsertC is used whenever a component of a marked sequence is to be inserted. Its form is insertC (left/right, marker, POS#, increment to RLN). The third parame~er specifles which component of the marked sequence is to be inserted.
Although the microprogramming language described above has some powerful features (especially the send, mark, and insert statements), it is basically a low-level language. It can be used to full advantage only if one understands the operation of the processer to a sufficient degree.
This language ofte-n allows several different micro-programs to be written for the same primitive. The easiest examples to illustrate this involve some rearrangement of the operand. Consider a primitive EXCHANGE, whose effect is C~EXCHANGE, (x, ~ ~(y, x).
It is possible to write a microprogram that leaves the expres-sion _ in place, inserts y on its left and erases the original copy of y from the program text. As an alternative, it is possible to write another microprogram that lea~e~ y in place, inserts a copy of x on its right, and erases the original copy of x from the program test. Since for a short while two copies of y (or two copies of x) must exist in L, it would be desire-able to move the shorter one of x or y. Since the lengths of x and ~ become known only at runtime, a third version of the same microprogram could test the lengths of x and y and move "`` l1~9~t thc shorter of the two.
One more issue -that should be briefly mentioned is tes-tirlg the syntac-tic correc-tness of the whole RA. Since the RA may be arl arbitrarlly long expression, with arbitrarily deep nes-ting, its syntac-tic correctness cannot always be fully tested by the processor. ~lowever, the following tools are availab]e: (l) the segmerlts of the microprogram mL1st match the corresponding constituents of the program text, other~ise an error message is generated ( when the microprogram is dis-tributed the only thing -the processor will have to do is to observe whether there are any segments of the microprogram that find no des-tination with the specified description or there are any occupied cells of L in the active area that received no microinstructions): (2) the microprogram can do some further checking of syntactic correctness with the help of the mark statement and conditionals.
In fact, experience has convinced us -that this kind of syntactic checking, in which syntax errors are discovered only when they prevent further processing, is extremely helpful.

6. Internal Mechanism Most processing activities are decomposed into elemen-tary, component activities, and these component activities are "hardwired" into the cells of T and L, ready to be acti-vated in certain states of these cells. The program text in L, and the microprograms corresponding to the operators of the innermost applications, only initiate these so-called internal mechanisms and supply parameters for them.
In this section we describe all six of these internal mechanisms. The first two of them described in 6.a. and 6.b., are initiated by the reduction language text in L. The other four are initiated by certain microinstructions: 6,c. describes how the mark statement is implemented, 6.d. describes the imple-mentation of the send sta-tement and 6.e. and 6.f. describe what happens as a result of executing insert statements.

~L~

'T`he (le cr;ption o~` these mechanisms amounts to speci-~`ying what an arb-itrary cell of L and T does in cer-tain states.
~lnce we want -to describe only the essen-tial aspects of these mechanisDIs and omit the less interesting details, we choose to supply a regris-ter-level descri~ptiorl specifying how many registers are needed by -the cell in a given state, what kind of informa-tion they mus-t ho:ld, and how -their contents are computed.
The computations are described using various informal means, such as tables, flow charts, algorithms in ALGOL-like language, and plain English. In describing computations that take place in a node of T (or in a node of an area) the values associated with the chosen node of T (or with the node of the area) are indicated by index l; those associated with the left and right successor nodes, by indices 2 and 3.
6.a. Partitioning In section B.2 we have already explained what the partitioned processor is like. Here we describe the process by which the cells of T are partitioned in response to the contents of L
Partitioning, as indicated in Figure 13, takes place in two successive states in a mode 1 operation. Upon entering states k+l (k=0,30,50) left ends are located, and on entering k+2 (k=0,30,50), right ends are located. When an innermost application is first located, its area goes through states 1-2-... Partitioning in states 31-32-... and 51-52... takes place whenever the RA has already completed part of its proces-sing, and has been moved to a different part of L as the result of storage management, where it has to rebuild its area.
(It can rebuild its area because symbols are erased and new symbols created only when it is kno~nthat the RA will not have -to go through storage management again.) in 31-32-... the requests of the RA for space for insertions have not yet been satisfied; in 51-52-.... space has been granted, but there are still messages to be sent or data to be moved.

'S g 'i ~ l First wo d~scribe ~hat happens in states k+l, and con-sider sta-te l in detail, since it suhsumos the cases of states 31 and 51. In the process of locating left ends, by the time in-formation from 1 reaches the top of -the area, we must be able to determine (1) lf the area is active, and (2) if so, which microprogram to ask for. The answer to (l) is produced as described in Section A,4, whereas the answer to (2) is specified by Figure 21, in which X,Y, and Z denote the first three symbols to the right of~ . Consequently, to produce the an~ ~s to (l) and (2), the following information must be available: (l) the absolute level number (ALN) of the ~symbol, (2) ALN of the next ~symbol on the right when it exists )we shall call this the ri~ht nei~hbour of the ~symbol, (3) the smallestALN of any sym-bol in between, which we denote by MLN, and (4) the three left-most symbols of the program text between the two ~symbols, denoted by X,Y, and Z.
In order to make all this information available in the right place, information in the following format will be passed from each cell of T to its parent cell: (MLN,X,Y,Z,C, ~,MLN,X,Y,Z) where the ~ symbols represent the ALN of certain application symbols. To simplify matters we shall use M to denote the collection of MLN,X,Y,Z, and indicate groups of unusedregisters by a dash.
There are four general cases which describes the information passed upward between two nodes of T, and -they all make some statement about the segment L under the cell of T
which sends the information:
F=(--,--,--,M) the segment contains no C symbol; field 4 refers to the whole segment, which may contain zero or more symbols.
G=(--,--, CM) the segment contains exactly one C symbol, which is to be found in the leftmost cell of the segment; field 3 holds the ALN of that symbol; field 4 refers to the rest of the segment, which may contain zero or more symbols.
-, 46 the segment ~ont~ sat ~Leas-t two symbols the leftmost cell of the second cont~lirls nnC symbol; field 2 contains the ALN of that symbol: f-leld 3 contains the ALN of the rightmost ~symbo] in the segment; field ~ refers to -the segment to the right of the right-most C symbol, which may contain zero or more symbols.
I=(M,~, <,M) the segment contains at least one~ symbol; the leftmost cell of the segment does not contain an ~ symbol; field 1 refers to I the leftmost ce]ls of the segment on the left of the leftmost~
symbol; field 2 holds the ALN of the leftmost Csymbol; field 3 holds the ALN of the rightmost (which may also be the leftmost) ~symbol in the segment; field 4 refers to the rest of the seg-ment on the rightof the rightmost<symbol, which may contain zero or more symbols.
As an illustration, in Figure 10 some links among cells of T are labelled with one of F,G,H and I (Single links are always labelled with F or G; double links with H or I).
Using the notation just developed, Figure 22 specifies 2~ how an arbitrary cell of T is partitioned in response to the information arriving at it ( in the form of F,G,H or I) from the lower levels of T, and what information is to be sent further up the tree, Each of the 4 *4=16 different input combinations pro-duces one of the eight possible partitioning patterns, shown on the left of Figure 22 in our symbolic notation, and one of the four possible output combinations.
Figure 22 indicates that a node of T may be required to do a~y of the following processing activities in the course of partitioning:
30 (1) route contents of registers from input to ~utput (as in Fig.22b) (2) combine two M fields (as in Fig.22a) by taking the minimum of the two arriving MLN values, and by keeping the three leftmost symbols from the six arriving ones;

(3) whenever an application symbol meets its right neighbour ~7 (a; in ,~i~.2~?i)), deterrnine hy te~ting according to Sec-tion A.4.
if -the arel i; active ( ln Figu-te 22,"RA" denotes, and a brace connects, the flelcls whose con-tents are to be used in the test);
(L) if the cell in question i5 -the root node of T, then it declares (aslt l~rou]d lnFigs.22a,22b,) that the rightmost appli-cation located in L, and corresponding to fields denoted by "ra~' in Figure 22, is an innermos-t application;
(5) if the cell contains the root node of an active area, then the root node determines ~hich microprOgram to ask for.
The processing activity may be performed in a node of the area which is below the top of that area. The test of S ection A.4. requires that an arbitrary ~symbol be brought together with the nearestC symbol on its right in L (recall that we decided to refer to them as neighbours), and hence we shall have to make sure that each ~symbol will go up in T far enough to meet both its right and left neighbours; an arbitrary <~symbol will learn from its right neighbour if it is the left end of an RA, and it will tell its left neighbour if the left neighbour is the left end of an RA.(The two special cases, 20 the leftmost and the rightmost ~ symbols in L, must go up to the root cell of T to find out about their special status.) As an example, consider the third area from the left in Figure 10.
The <~ symbol of this area moves through cellsl20 and 110, and in 15 it meets its right neighbour, which came through cells 122 and 111. The test of A.4. is executed here, in cell 105, and the area is declared active. However, the top of this area is further up, in cell 102, where this ~ symbol meets its left neighbour, which came through cells 117, 108 and 104.
We may summarize the process of locating left ends as 30 ~llows, All <~ symbols begin to move towards the root of T
simultaneously, each one causing nodes of T to be partitioned (according to the riiles of Figure 22), each one carving out an area for itself. (lt might be helpful to imagine that eachC

symbol is building a "wall" through T, which is to become the 1 ~ ~, 9 ~ 5 ~
le:ft-}lallf~ hnurld~r\/ '(~r that are;l.) The ~symbols climb up on T
onl~/ url-t;:l -they mee; ~loth -the-i:r neighbouTs. By the tlme the root node of T en-ters ls-I, the process of ~Loca-ting lef-t ends is completed, the roo-t nodes of act;ve areas are all known, and each such root node knows -that microprograms i.s to be applied to its RA.
Durlng -t.he next downward cycle, while entering states k~2 the following occurs in the ac-tive areas jus-t located; (1) ALN of the~ symbol is sent to every symbol of the RA, i.n order that they can compute their RLN; (2) the right end of the RA is IC~ located, and subtrees of the area whose leaves hold symbols of the program text not in the RA are cut off from the active area.
Location of the right end of an RA is carried out with the help of a two-bit code W, indicating which of three possible statements characterizes a node of the active area; if W=OO, then the whole subtree under the node should be cut off; if W=ll, then the whole subtree remains in the active area and if W=10 then the dividing line is in the subtree. W is initialized to 10 on top of the area, and further W values are computed with the following rules;(l) if a node has only one successor in the area, W is propagated one level down; (2) if a node has two successors in the area., Figure 23 gives the rule to compute the value of W for the two successor nodes. (in Figure 23 ALNl is the ALN of the C symbol, and MLN was left in the nodes of the area by the previous upward cycle.) Figure 24 shows an example for locating the right end of an RA, also indicating the values of W at certain points in the active area, ( A subtree being cut off from an active area means that only state changes can get through the dividing ~0 line. As a result, such a subtree will go through the same sequence of sta~s as the active area, but no processing will take place in it.) b, Distribution of Microprograms In this section we describe how segments of the micro-program; find their way from -the top of the active area to the;r respective destinations in I. The ou-tline of -this process is as follows. With state change k+2 ~k+3 (an upward cycle) cer-tain information -is placed in-to nodes of the active area.
This information, which we call the directory, will permit the segments to navigate downward through the area to their destinations. As soon as parts of the top cell T enter states k+3, -the microprograms requested begin moving in (see B.7.) This is a mode 11 operation, in which information is flowing only towards L. The microprograms move through T, and no state changes occur until all microprograms have moved through the root cell of T. The next state change k+3 ~ k+4 moves down T
following the last microprogram, and hence when cells of L enter state k+4, they will have received all their microinstructions and will b~ ready to begin execution.
The directory has three components, denoted by P,Q, and R, which contain information about the distribution of RLNIs in the segment of L under the node of the active area in question.
P contains the total number l's in the segment (P can be 0,1, or 2), Q contains the number of 2~s after the last 1 from left to right (Q can be arbitrarily large), the R contains the number of 3's after the last 2 from left to right (R can be arbitrarily large). The rules to compute the directory are contained in Filg~e 25.
The nature of the directory underlies the restric-tions we had to place on the destin:~tin expressions. Since the directory has no information about symbols with RLN- 4, destination expressions cannot be of the form either E/i or S/i ~re i - 4.
Figure 26 shows the essential features of distributing microinstructions in an active area. The segments of the microprogram enter on the top of the area, one after another.
Each segment is broadcast to every cell of L in the area, but only occupied cells that are specifically instructed to accept a segment will do so. The instruction to accept a segment is 5o 11~9~
generated on the top of the active areas, and i-t travels just in front of the segment to which it applies.
As Figure 26 suggests, specifying the symbols (possibly only one) that should accept the segment of the microprogram in question can be done by specifying the two end-points of an interval of cells in L. Any symbol in the RA with RLN c 3 can be pinpointed by specifying a triple (p,q~r) with the following interpretation: starting at the left end of the RA, pass by p symbols with RLN=l, then pass by q symbolswith RLN=2, then pass by r symbols with RLN=3; this puts you to the immediate right of the symbol desired. If, on the top of an active area, we have the specification of a single-symbol in the form of such a triple, then the directories contain information to enable us to find this symbol by moving down through the area.
Thus these triples serve the functions of an "absolute address".
Although the symbols in L have only their RLN stored with them, the directories stored in T give us the effect of having a triple stored with each symbol of the RA, as illus-trated by Figure 27. This figure also suggests an easy way to specify intervals that contain expressions of unknown si~e;
we shall say thab an expression extends (from left to right) from (p,q,r) up to but not including p'~q'~r')--for example, we do not have to know how many symbols there are in the operator expression to be able to say that it extends from (l.O.O.)up to but not including (2Ø0).
Using this method of specification, the cell on the top of the active area will use the destination expression ( which gives a "relative address:) of the next segment to be distributed, and the second triple of the preceding segment 3~ to produce two such triples (each one an "absolute address") specifying a sequence of cells in L which are to accept the segment of the microprogram in question. We will forego giving fur-ther details of the process of distributing micro-programs.

~. ..~

9 ~ $ ~
c. Marking ~xpressions In section b.5.b. Figure 20 was used to illustrate t~le L ull eff`ec-t of t~le s-tatemen-t "mark with x" in which the programrner chooses the symbol x only to dis-tinguish this mark staternent from o-ther3in -the same area. The mark statement also c~ses a number of registers to be se-t in each marked cell of L. This statement is implemented in each appropriate cel] of I, as follows: (1) in state k+4 -the microprogram places x into the MARKERl register, and if RLN=j+l, then x is also placed into the register MARKER2 (here j is the RLN of the leftmost symbol of the marked expression which can be obtained by the cell from the destination expression), (2) next, during cycles k~4~9h~5 and k~5~k+~t~evalues in Nl and Ll are generated from the markers inMARKERl, and at the same time the values in POS# and LENGTH are generated from the markers in MARKERl, (3) finally, during cycles k+6 ~ k+7 and k+7 -~k+8, the contents of POS# are considered as markers, and the process is repeated resulting in values placed into N2,L2 and L/R. (on ccasion we shall refer to the computation of Nl,Ll,POS#, and LENGTH from MARKERl and MARKER2 as first-level marking, and to the compu-tation of N2,L2, and L/R from POS# as second-level marking.) When the mark statement is executed, nearly the same marking process is performed three times, and so we explain only the common aspects in them. Certain details that we omit, such as algorithms to place serial numbers not only into marked cells but also into unmarked on~ on their right (as in the computation of POS#), and algorithms to pinpoint the leftmost and rightmost cells of an expression (as in the computation of L/R), can be obtained with minor modifications of -the algorithms that follow.
Let us first consider a single mark statement in the whole area, and only the computation of Nl and Ll. In the upward cycle we count the number of occurences of the marking symbol x, and put the counts-~called SUM--in nodes of the area.
During the next downward !cy~e we index the symbols of the ` 11S~15~
nl-lrk(-d ~x~-,r~ r~ h ~olitive l~ltegnrs~ and p~Lace the index of each sylllbo~l ;n M~. Thi; is done by computing and sending down each hranch -the inde~ (denoted by IN) of the leftmost marker in the segment. After ini-tia]izing IN to 1 on the top of the area, the algorithm of ~igure 28 can be used in each node to compute IN.
At the same time that values are being computed for NI, the value for Ll, denoting the total number of symbols in the marked expression, is broadcast to all cells of the area: this is equal to the value of SUM a-t the -top of the area.
However, we may have to consider a large number of different marker symbols in the same active area in both first and second-level markings, partly because the second-level marking in a single-mark statement requires it, and partly because there can be an arbitrary number of different mark statements, one per segmen-t of the microprogram in the active area. All marked expressions of the first level are mutually disjoint because any segment of the microprogram contains at most one mark statement, and all marked expressions on the second level are mutually disjoint, because they are elements of sequences marked on the first level. On the other hand, the unbounded number of different markers in an area might signal complications in trying to carry out all these markings at the same time. Fortunately, we can argue that any node of an active area is required to be involved in at most four marking processes on either level of markings.
The outline of the argument is as follows. Assume that the segment of L under a node of the active area contains symbols marked with A.B.Y and Z, in that order from left to right. The expressions marked with B and Y are fully contained in the segment, and thus the numbers of their occurrences are already known, and need not travel any higher up in the area.
On the other hand, the expressions marked with A and Z may not be fully contained in the segment, so the number of their occurrences will have to move higher up in the area to be " 11~9~

t,ot~'le(l llT~!. This melrlC~ that arly ];nk inside an active area will have -to carry the partia,] totals of -the leftmost and rightmos-t expressions marked in -t,he segment, and consequently a,ny node will have -to deal with a,t most four such partial totals arriving at it. This argument s-trongly resembles the one discussed a-t length under parti-tioning, and so we do not elabo-rate any further on it.
d. Messages The message mechanism, activated by send statements, I is the chlef means of communication between the cells of Lholding an RA. The microprogram determines which cells send messages, it specifles when the first set of messages should be sent, and it determines which cells accept which messages.
Two problems arise in the implementation of the message mechanism;
(1) Whenever messages are sent which are to be com-bined (i.e. their processing is a mode 1 operation), we must insure that all and only those are combined that are supposed to be.
(2) Whenever messages are sent that are not to be com-bined (i.e.,their processing is mode 11 operation), they must all pass through the root node of the area one by one. The length of such a process is data dependent, and we must allow adequate time for its completion.
Sequenced messages are distinguished by the value of i in the sendi statements. These indexed messages will never interfere with each other, because sendi+l cannot begin unless sendi is over. The completion of sending all messages with index i can be detected by the cell on top of the active area simply by observing that there are no more messages -trying to go through it. Then a completion signal can be broadcast to all cells of L in the area, ini-tia-ting the next set of indexed messages, or leading to the transition 61 -~912 (unless data movement in the same area is still going on).

A olll~;;orl to ~)roblelTl ( 1 ) is provided hy allowing no more th.lll one operator to OCCllI' for each value of i in the sendiC statenlell-ts o~ any microyrograrll.
A solution -to prob]em (2) is -the following. In state 4 all -the rnicroprograms are already in the cells of L. With transition 4 ~ 5 we determine if there is any chance of not being able to deliver all -the messages in -the basic cycle (states 1 to 12), and if so, the area chooses 16 instead of 6, as its next sta-te. (The presence of insert statements in the area also results in this transition). A simple way -to determine which state to enter after state 5 is the following: choose 16 as the next state whenever (1) there is any sendi (as opposed to sendiC) sta-tement in the area, or (2) there are only sendiC
statements, andacertain relationship holdsbetween the maximum value of i and the statement label of sendlC statements (e.g.
if sendlC is labelled with 8, there is time between states 8 and 12 for at most sendlC and send2C).
Once 16 is chosen as the ne~t state after 5, the seq-uence of states for the area is 16-...-24-51-52-... (unless the 2C area contained insert statements, and was forced to enter sta-te 40 after 19--see Section B.6.f.1.), after which it can go through the state sequence 51 through 64 as many times as nec-essary to complete the process of delivering all the messages.
Delivery of messages will have to be periodically inter-rupted, because in states 63 and 64, storage management is taking place. To avoid any ill effects of such an interruption, we permit cells in)L to emit messages only in states 52 through 60. As soon as a node of the area enters state 61, all messages in it that are on their way up have their direction reversed, ;~ and are sent ~ack to the cells of L which emitted them. By the time the root node of the area enter state 61, all messages in the area are moving downward, and by the time L en-ters state 62, T is empty and storage management can begin. (Messages that were not able to reach their destinations will be sent 1~91~
agc~ during 1he next c~cle.) e. Data Movement Type C process-lng requirements are always implemented with insert statements, which in turn, utilize data movement and storage management. Storage management involves sending data along the cells of E. Data movement involves sending data from L through T and back -to L. Data movement is always initiated by insert statements, and it is -the only means of duplicating and rearranging parts of the RA. I-t has similari-~0 ties with the message mechanism (in particular, with sendingmessages that are not to be combined), so we will comment only on the differences.
Whereas in the message mechanisin the writer of the microprogram specifies the messages at will, during data move-ment the "message" that has to travel through the active area consists of a data marker, and the final values of the S and RLN registers of the cell. A data marker is a unique, composite information item, which guides the data to its desti-nation. There are two kinds of data markers: whenever a whole marked expression is inserted, the contents of the register pair (MARKERl~l) will serve as such; when, however, only an element of the marked sequence is inserted, the contents of the register triple (MARKERl,POS#, N2) will be used, Whereas messages are sent by cells in which a send statement is executed, the sources of data movement are marked expres~ions named in insertE or insertC statements. Copies of these marked expressions are to be inserted next to the cell of L in which the insertE or insertC statement is executed.
The process of marking expressions is complete in state k+8 3 and during the next cycle--with state changes k+8-~ k+9 and k+9 -i~k+lO-- these insert statements can notify all marked expressions that are to be moved. The details of this process are explained in Section B.6.f.l Whereas messages that are sen~t are accepted by cells containing RA symbols whose microinstructions refer to registers ~ ~ ~ 9 ~
~l. through ~l~" the ~ata i-tems broa.(lcast similarly during data movement are accQpted by cel~s which are empty cells except for a colL markQr. (Since such cells are not completely empty, we will cal].-them "reserved".) A cell marker contains the same informa-tion as a da.ta marker; it identifies the cell of L in which l-t res.ides as a destination of a data item carrying the corresponding data marker. The cell markers originate in cells of L holding -the corresponding insert statements, and they are placed in the empty cells during storage management.
~ When a reserved cell accepts a symbol and its RLN, they are placed in registers S' and RLN', and only at the end of proc-essing the RA are these values transferred to registers S andRLN.
Whereas messages can sometimes be sent between states 4 and 12, data movement will always take place in states 52 through 61, irrespective of the number of data items to be moved. This is because in type C processing at least one symbol of the program text mus-t be created that was not there to begin with, so reserved cells in L are created in the right places, through the process of storage management, to accommo-date these new symbols.
f. Storage Management This section describes the other internal mechanism involved in implementing the insert statement. This mechanism, called storage management, used the information provided by the parameters of the insert statement to createthe required reserved cells wherever insertions are to be made. This is done by shifting the contents of occupied cells in L. ~In this section, whenever there is no danger of mis-interpretation, we refer to the contents of an occupied or reserved cell simply 3~ as "symbol".) Storage management never changes the reduction language program text in L i.e., it does not create or destroy symbols, it does not change the left-to-right order of symbols in L, it only changes the distribution of empty cells in L
(some of which become reserved in the process).

~ :i!lCe the numher and di~-tribution of empty cells in 1 is corlstarltl\/ changirlg, there ;s never any guarantee that whenever an RA needs additiona] empty cells, these empty cells will be avallable in lts own ~rea. Therefore, in order -to have the greatest l-ikelihood of providing all the empty cells wherever they are needed, we must abandon the existing partitioning of T, and perform s-torage management in the whole processor simultaneously.
fl Preparation for Storage Management l Before storage management begins, the following information is placed into every cell of L; during storage management w symbols will enter the cell from left (or right), and w-l w or w+l symbols will leave it on the right (or left).
The totality of these pairs of integers in L will be called the specification of storage management.
Preparation for storage management means finding a specification for storage management with the following properties:(l) it is consistent (meaning it will not destroy symbols by, for example, trying to force two symbols into one cell), (2) it satisfies as many requests for insertions as possible, and (3) whenever it is not possible to satisfy all requests for insertions, it will satisfy all requests of some RAts so that they may proceed with execution, and it will cancel all requests of the other RA's so that they do not tie down any resources unnecessarily. (The cancelled requests will, of course, be made again at a later time).
Most of the computations involved in the preparation for storage management are not affected by area boundaries.
In order to explain these computations, we shall assume that each cell of T will be able to hold six values, denoted by PT, NT, PTOP,E,BL,and BR, to be defined later. In addition, cells of L will have values of PT, NT, BL and BR.
Although in Figure 13 onlystates k+9 through k+l2 are indicated as preparing for storage management, this preparation ~ i5~

realLy t)egin; in s-tate 4. A-t this time all insert statements are recogn;%ed t~y the root node of the area, and if there is at east one of -them in the area ~ even if only in a conditional and eventually not executed), the area goes through the state transi-tion 5-j~16. Each insert statement is said -to represent a certain number of inser-tion requests, each of which is a request for a slngle reserved cell. By the -time cells in L
enter state 18 marking of aJl expressions has ended, and the length of every expression to be inserted has been determined at the location of the expression, although it is not yet known at the places of insertion. (In other words, an insert statement may not yet know how many insertion requests it represents).
Now we need the following definitions: in an arbitrary node t of T.
(1) PT is the tot!al number of insertion requests in all the areas whose tops are in the subtree of T whose root is t (for the root node of T, PT is equal to the total number of insert-ion requests in L, but a similar statement does not necessarily hold for other nodes of T).
(2) NT is the ~tal number of empty cells in the segment under t.
(3) PTOP is the number of insertions requests in the areas whose tops are in t; PTOP=O if t does not contain the top of any area. (The root cell of T may contain the tops of as many as four areas; other cells of T may contain the tops of zero, one or two iareas.) With transition k+8 -~ k+9 the values of PT? NT, and PTOP are computed in each cell of T in the following manner.
NT is computed in cells of T by totally disregarding area boundaries:NTl - NT2+NT3. In cells of L, NT of each cell is set to 1 if the cell is empty and O if it is not.

~ "r()l' -is coTn~ te(l ~y adding up the insertlon requests in each active alea separate1y. The lengths of expressions to be inserted Inust be made available at the points of insertion.
Only in ~ate k~8 does it become known which cells are to exe-cute insert statemen-ts, and what the lengths of all the marked expressions are (-the first needs L/R; the second L2). The cells holding insert statements know the identity of the ex-~pressions to be inserted but not their lengths; the cells hol-ding marked expressions have all the length information needed.
fO The former send information to the top of the area in the form of (CT,ID): the latter, in the form of (length, ID). In both formats either ID=marker, or ID= (marker, position#), and CT is used to accumulate the number of different insert state-ments which will need the expression denoted by ID. While these items of information are moving up in the active area,(l) super-fluous items, such as extra copies of the same length infor~
mation, can be thrown away, (2) the items can be sorted, so that they arrive at the top in order of, say, increasing ID, and (3) the values of CT for each ID can be accumulated.
Point 2 means that the top does not have to store an unlimited number of items, it only has to compute for each ID PTOP = PTOP
-~T~ length. (Also, (1) (CT,ID), which comes from an insert statement, is broadcast to L so that the marked expression corresponding to ID will learn that it will have to move during data movement, and (2) (lenght, ID), which comes from a marked expression, is broadcast to L so that the insert statement, which wants to insert the marked expression corres-ponding to ID, will learn the length of that expression).
PT is produced by each cell in T by computing PTl:=
3~ PT2 + PT3 + PTOP; in L, PT = O.
Having computed PT and NT in the root cell of T, we can decide how many insertion requests to cancel temporarily.
One of the many possibilities is to carry out the following computation on top of T: assuming we want to keep a fraction p of L empty at all times, NT - PT should be at least as ~ (2"'-r~ t~ t~ r~ nbi?r Or cel~s in L. If -this is not the case, we have to rel;ain some empty cells by cancelling ~lt lea~t E lnsertion req~lests where E must satisfy NT-PT-~E--p"(2~'n). We therefore compute on the top of T a value E=p~(2"'~n)-NT+PT.(No-te that since p7'(2'~n)--NT is guaranteed by the previous s-torage management, from the above formula to compute E we get EC=PT, and hence the required number of cancellations can alwa~rs be made.) Next, with state change k+9-~k+10, we start moving downwards in T, cancelling all insertion requests of some areas.
If El>O(there are still some cancellations -to be made in this subtree), and PTOP~O the El~:=El-PTOP (cancel all insertion re-quests of these areas temporarily, and make them enter state ~0 instead of 20, from state 19). If El~O divide El~ up between the -two subtrees in such a way that both will be able to make the necessary number of cancellations. This means E2 and E3 are defined by the equation E2+E3=El' and by the ratios E2:E3 and PT2:PT3 being equal. Since PT2+PT3=PTl-PTOP, E2:El'=PT2:
(PTl-PTOP) and E3:El~=PT3:(PTl-PTOP) will also have to be satis-fied, so the rules of computation can be E2:=PT2`~(El'/(PTl-PTOP)) and E3:--PT3*(Ell/(PTl-PTOP)) ~Note that since originallyEl-~PTl, and also El~-~PTl-PTOP, as a consequence E2--PT2, and E3C-PT3, and hence all cancellations can be performed.) This way a certain number of active areas are selected for temporary cancellation, and the ones selected are the ones whose tops are closest to the top of T.
It may happen that this algorithm cancels all inser-tion requests inthe processor, thus preventing any further execution. One possible reason for this is the arbitrary 3c order in which this algorithm makes the cancellations. (As an example, consider L containing only two RA's one requesting 50 empty cells, the other one requesting 200 empty cells, with E=60: the above algorithm will in some ci-ccumstances cancel ` ~
~ ~9~5~
both reqllests.) In -this case,the problem can be circumvented by a more "clever" algorithm for cancellations, such as one in which the value of E can change dynamically. In other cir-cumstances, L may simply not have enough enpty cells. In some cases this problem can be a]leviated by temporarily removing part of the program~ We shall not elaborate on either of these solutions any further.
The first half of preparing for storage management ends when cells of L enter state k+10. Those in 10 or 60 are ~ in areas without insertion requests, those in 40 had their insertion requests cancelled temporarily, and those in 20 are going to have their insertions requests satisfied.
During the second half of the preparation, we com-pute the specification of storage management, and this com-putation too goes across area boundaries. Now there are four values of interest associated with each cell of T, each holding some information about the segment of L under the cell of T
in question;
(1) NTis the number of empty cells in the segment;
(2)PT is the number of insertion requests in the segment;
(notice that the interpretation of PT has changed somewhat, and its value must be recomputed);
(3)BL (boundary condltion on the left) specifies the number of symbols to be moved into the segment from its left (BL can be either positive or negative);
(~) BR (boundary condition on the right) specifies the number of symbols to be moved nto the segment from its right (BR
can be either positive or negative).
During the next upward cycle, with state change k+10 k+ll, PT is recomputed. The rule of computation is PTl =
PT2 + PT3, where on the lowest level of T, PTl is set according to the insertion on requests of adjacent cells of L. (The value of NT, denoting the number of empty cells, remains unchanged).

~r I~lJrin~r t.h~ nex~ downw~r~ cycle, with state change k+]l~k~-12, -the values cf BL and BR are computed as follows;
(l) on top of T, B[,:=BR:=0 (no ce]ls are moved across the endpoints of 1,).
(2) starting at -the rood node of T, information flows down-ward, and in an arbitrary cell of T BL2:=BLl, BR3:=BRl(these are boundary conditions already given), and BL3:=BR2 (this follows from our convention for signs). To compute BR2, we set S2=BL2+PT2-NT2 and S3=PT3-NT3+BR3, and set the value of BR2 l~ by the following expression if (S2C0)--(S3~0) then 0 else if ¦S2¦' ¦S3¦
then S3 else -S2 The formula shows that symbols are moved across segment boundaries only if absolutely necessary. If sign (S2) = sign (S3), then no movement takes place between the two segments, because both of them have enough empty cells. If, on the other hand, sign (S2)~ sign(S3), then min ( ¦S2¦ , ¦S3¦ ) symbols are moved across the boundary, the sign of the value of the expression properly specifying the direction of the movement.
The BL and BR values computed by this algorithm, and placed into cells of L, constitute the required specification of storage management.
Figure 29 shows an example of this second half of preparing for storage management, the computation of the BL
and BR values.
f2.Storage Management Process With state change k+l2-~k+13 the process of storage management--the only mode 111 operation--begins, and we use Figure 30 to explain its most important aspects. Figure 30, which consists of seven consecutive snapshots of L, exhibits ~ ~.,9~

the mov~m~rlt of' sylnbo1 , wh-i(h takes p'lace according to the spec-ification whose preparat;on is sllown ;n Figure 29.
During storage Management, the contents of occupled ce~ls are shif'-ted in 1,, and sorne empty cells become reserved by having cell markers placed inlo them. We will describe a scheme in which each occupied cell C(i) moves in one direction durlng the first N (i) s-teps of the process, and then remains in that position for the duration of the process. Moreover, no occupied cell will collide with any reserved cell.
~D We assume that the specifications were generated by the algorithm of the previous section. In order to perform storage management as we have described, the cell markers must be laid down in a certain fashion. Consider a cell of I, with BL=l (~e symbol is to enter on the left), and BR=-3 (three symbols are to leave on the right). Such a cell has insertion requests inside it, sinceBR+BL-l. and consequently it must be occupied. These two facts imply that the three symbols to leave on the right must be the original symbol in the cell and two reserved cells to be generated next to it. But the symbol 2~ which will enter on the left of the cell in question may occupy the adjacent cell of L, in which case it will enter the cell in question during the first step. The only way this can be achieved is ifthe original symbol (together with the inform-ation about what cell markers :it should generate) leaves the cell on the right during the first step, and as it moves from cell to cell in L, it leaves behind the correct cell markers and t~e original symbol in their correct final position.
Figure 30 shows two examples of this. The third cell of L from the left had an insertion request of 6 on its left, so it must lead to six reserved cells (labelled with al through a6), followed by the original symbol. The specification says this must be brought about by emitting two symbols towards the left, leaving one in the cell, and emitting four on the right.
The cell can easily determine that it should emit al on the left (together with the instruction to leave a2 behind on the way), XhOII~ PaV(? a3 in -the cell, and should emit the original con-tents of t~e cell on the righ-t (together with instructions to leave al"a5 and a6 behind on the way). (Note that only during the f;rst step does a symbo:L leave this cell on the right, but -that symbol really represents four symbols:a4,a5, a6 nnd itself.
The sixth cell of L from the lef-t had two insertion requests, each of value two on its -two sides, so it must lead to two reserved cells (labelled with bl and b2), followed by the original symbol, followed by the other two reserved cells (]abelled cl and c2). The specification says this must be brought about by admitting two symbols from the left, and by emitting six symbols on the right. The cell then emits its original contents, representing five symbols, which on its way is going to lay down the required symbols in L.
By the time a cell enters state k+l2 it has all the information necessary to determine what to emit, in what order, and in which direction. If the segment of microprogram in the cell has more than one insert statement executed, their textual positions in the segment will determine the relative order (from left to right) of the correspondingly inserted symbols and expression in L. The insertS (left/right, symbol, RLN) statement carries the symbol to be inserted as a parameter;
its execution leads to the second and third parameters being left behind in the appropriate cell of L. The insertE (left /right, marker, increment to RLN) statement is used for inser-tion of an expression from another part of the program; its execution leads to three values being left behind in consecu-tive cells of L: the marker, an index number (between one and ~0 the length of the expression which by now is known by the cell), and the increment to RLN (RLN will be updated during data move-ment). Similarly, the insertC (left/right marker POS#, incre-ment to RLN)is used to copy an element of a marked expression; its execution leads to marker POS#
an index number, and increment to RLN being left behind.

?~
~ ow we carl sumlnar-ize what happerls to the con-tents of -arl arbitrary occupiecl celL of L during storage management.
~hen the process begin~s, the symbols starbs to move as directed by BL and BR of the ceLl in which it resides. When ever it crosses a cell boundary, the corresponding BL and BR values of the two cel]s are reduced(in absolute value) by j+l, where j is the mumber of cell markers (resulting from insertE and insertC statements) or additional symbols (resulting from insertS statements) it is going to leave behind on its way tO ( each of these j cell markers and symbols is left in i-ts final place, and will not have to move again during this storage management). Whenever bo-th BL and BR of a cell of L become zero, the cell sends a signal to this effect to i-ts parent.
Each cell of T, upon receiving a signal from both of its children, sends one to its parent. When these signalsarrive at the root of T,state change k+l3-i~k+1~ takes place, storage management has ended, and the operation of a processor continues.

The operation of L during storage management can be ~0 likened to that of a so-called shift-register: the original contents of cells are shifted under local control (BL and BR), without any reference to their neighbours, because symbols will never get into each other~s way. Storage management, how-ever, is also a much more general process; different symbols may move in different directions and by differing amounts before coming to a halt.
It is easily seen that the total number of steps to complete a synchronous storage management is equal to the largest BL or BR value in L. If the largest BL or BR value is as small as possible, the specification of the storage management can be called optimal.

So far synchronous operation has been described because it has simplified the description. However, just as in every other context in this machine (see Section B.3), this \
~ 1 ~ 9 ~
an as;~ ti(-n t~lat n~e(] not he ma-le~
This assu~Dp-tion would limit the size of machines of this sort tha-t could be bui-Lt~ Further, the amount of infor-mation t,hat has to be shifted from one cell to another (referred to as "symbol") may vary from cell to cell. (The symbol without insertion request;s may consist of only S and RLN, or the old and new values of S and RLN if either of them is to be changed at the end of processing. If, however, the symbol has insertions requests, the corresponding information must also be carried, IC to be gradually left behind in the cells of L.) StoraKe management can be made fully asynchronous by requiring that a cell transfer its contents to one of its neighbours only if both cells are in state k+l3, and if the neighbour has given permision for it. Such a process needs a local asynchronous control, such as described in the prior art.

- 7. Input/Output Operations.
I/O is defined to be any communication between L and the outside world, and it always takes place through T. Assume for the sake of simpliticity that the actual point of communi-cation with the outside world is the root node of T. For larger sizes of the processor, the root node will become a bottleneck, and a more satisfactory solution can be found, for example, by designating all cells of T on a particular level to be I/O ports. This raises two kinds of problems, both easily solved:(l) some active areas will have to be entered under their tops, and (2) these I/O ports will always have to know what part of the program text is under them, the first problem can be solved by first sending input information to the top of an active area, from which it is broadcast to the entire area. A
problem similar to the second one has been solved in B.6.b. with the help of the directory.

In order to outline a solution to the problem of I/O
postulate (1) an I/O processor which is not part of T or L, and which directly communicates with the (single) I/O port at the r ~
root of T, nl-ld (?) i,nternal I/O channels between L and the I/O
port. ~s Figure 31 shows, there is a dedlcated lnput channel and a ded;cated output channel both in the form of a binary tree. These trees are bot,h ln a one-to-one correspondence wlth T, a fact deplcted ln Flgure 31 by superimposlng them on T.
Thls allows a connection to be set up between the corresponding nodes of T and the l/O channel, and thus the top of an area can lmmedlately access el-ther channel.
The operation of the input and output channel is nearly l~ identical to a mode ll operatlon inside an active area (see Sec-tion B.3). Information entering at the I/O port is broadcast down to each cell of L~ On their way, copies of the information item pass by each cell of T, and one of them is thus guaranteed to arrive at the top of the active area which requested it (cop-ies of the item reaching L directly via the input channel are thrown away). Information items sent by active areas through the output channel to the I/O port will have priorities assigned to them; requests for microprograms will have the highest priority, followed by requests for other kinds of input whereas every ~O other kind of output will have the lowest priority.
It should be understood that all I/O functions can also be accomplished by a parallel transfer from an external storgage or buffer directly into the cells of L without trans-mitting information through the tree network.
We distinguish four kinds of I/O: (l) bringing in microprograms (2) bringing in definitions (3) user specified I/O, and (4) loading and removing user programs.
Bringin~ in microprograms is characterized by the following:
3~ (l) every RA needs a microprogram (2) information always moves from the I/O processor to L, (3) microp~grams are 1I short" -microprograms we have written so far require from a few dozen to a few hundred bits for their internal representations.
(4) there is no need to make room for microprograms in L--th('~ ~L:t'e t~eceive~d and stored in designated parts of cellsof I which hold symboIc; of the reduction language program.
Since the to-tal vol~lme of microprograms moved in at any time is relatively srnall and the microprogIams must be moved in ~a~i~ly to be able to continue execution of the freshly located RA's microprograms are treated differently from all other I/O. Wherever with state change 14 ~1 a new active area is located, information ~dentifying the microprogram it needs is generated on the top of this area. In the same cell of T this information is placed on the output channel and is then sent to the I/O processor. Later, when parts of the top cell of T enter states k+3,the requested microprograms start moving in on the input channel. While microprograms are moving in through the I/O port, all other input traffic is inter-rupted and after all microprograms have entered state change k+3-~k+4 takes place. From the input channel the micropro-gram enters the active area on its top and with the help of the mechanism described in B.6.b it gets distributed in L.
This organization implies that whenever more than one active area request the same microprogram, only one copy needs to be sent in through the I/O port. (Such a situation arises fre-quently, for example as a result of using the operator AA.) B _n~in~ in Definitions is requested whenever a defined operator is located in the operator position of an RA.
It is characterized by the following:
~1) though not used in every RA, definitions have to be brought in frequently (defined operators correspond to conventional subroutines and their use is imperative in large programs), (2) information always moves from the I/O processor to L, (3) lengths of definitions vary widely, (4) because definitions become a part of the program te~t, room for them has to be made in L before they move in.
User Specif ed i~tput is requested whenever 9 ~
1;ne RA has the f`o~l~owir~g form ~ IMPIlT, ident;fication~.Here I~PU'r i9 .1 pr-imi-tive operator, "identifi.catlon~ specifies the well-formed expression to be brought in and the above RA
reduces to -the expression tha.t ;.s brought in.
I_er speclfied output is generated whenever the RA
has the form < OUTPUT, expressic)n~. It reduces to "expression"
and its side effect is to send "expression" out to the I/O
processor.
I,oadin~ and removin~ user pro~rams can be handled in the following way. L always contains a sequence whose elements are the different user programs. When preparation for storage management indicates that there is adequate space in L to accept to a new user program, the I/O processor injects symbols into L corresponding to the RA ~INPUT, program identifier~>, and this new RA in turn brings in the new user program. Removal of a user program is performed whenever it has no more RA's in it.
User specified I/O as well as loading and removing user programs are characterized by the following:
_O (1) in general this type of I/O is least frequent, (2) it must be possible for information to move either into, or out of the processor, (3) the amount of information to be moved varies widely, (4) room has to be made in L whenever information is moved into L.
Input information other than microprograms can be moved through the I/O port in every state with the exception of k+3, when microprograms move in, and k+l3 when storage management is taking place.
3G Requests for every kind of input other than micro-programs occur with sta.te change k+l4--~ k+l and length information comes in with the microprograms. If storage management can satisfy all insertion requests, the I/O
processor is informed about it, and after storage management 1~9.~5~
!
is corl~)let;/, the irlpllt begins -to move in. The active area to which thc in~)ut is dlrected moves through states 51 through 64 as ,nany times as necessary until the whole input expression has arrived.
User specified output enters the output channel from the top of the area, while the area is moving through states 51 through 64 as many times c~ necessary. The I/O processor can immediately detect when a user program con-tains no more RA's and then arranges for all symbols of this expression to enter the output channel directly from L.
C Performance Evaluation In this section9 capabilities of the processor described will be examined with regard to exectuion times of individual user programs and throughput.
Parallelism On the Reduction Language Level First let us assume that the processor holds sin~le user program. The processor will begin the reduction of all innermost applications simultaneously, and ~ll complete all of them without delay, unless the total amount of expansion of the program text cannot be accGnnodated by L (in which case 2~ evaluation of some of the RAIs willbetemporarily delayed).
Thus the larger the number of RA's in the program text and the less frequent the delaying of their execution, the better the performance of the processor. Both of these factors are,to a considerabled~gree, under the control of programmer. In the prior art it has been found that many frequently occuring programming problems permit, quite natur-ally, several different solutions with varying degrees of parallelism. (The operator AA provides one way of intitiating several parallel exectuion pths.) In such problems there is a 3 natural time-space trade-off; the programmer can choose the algorithm formulation most appropriate for the circumstances.
In the course of executing most user programs, the degree of parallelism is algorithm dependent, and a low degree is often unavoidable. Nevertheless, whatever the nature of 7~

1 1 J 9 ~, 1 thf` F)rO'-~IIIl might t:)C`, t~f' ~)rore;sor :in;tiates and -terminates execu-tion of RA's (;.f'. f?~xecut-lon paths) with ease.
If the singlf? user progralll does no-t fill up the processor completely, additional user programs can be introduced into ~t~le a~ailable space. Executing_se~eral user pro~rams simu]taneously increases throughput wi-thout causing problems for the processor, because a collection of user programs can still be considered as a single expression in a reduction language.
Execution Time of A Single Reducible Application.
In order to simplify the argument that follows, let us assume that the time needed to get through the state diagram (from states k+l to k+L) is constant, and is denoted by S.
Let us further assume that messages as well as symbols during data movement move through the top of the area at a constant rate. Here we consider only messages that are not to be co~i~d If the RA in question has type A processing require-ments, m=O, and the time needed for the reduction is exactly S. If the RA in question has type B or C processing require-ments then m ~, and the area will go through states 51-...-6 as many times as necessary, each time delivering M messages.
Consequently the time needed to execute an arbitrary RA is given by the formula St(m/M)~S. This nonlinear functlon is bounded from above -by the linear function (2*M-l+m~S/M. This simplified formula clearly indicates that beyond a certain lower limit the amount of data moved (or messages sent) is the dominant factor in the execution time of any RA.
A more careful look at what this simple formula says reveals that the many factors contributing to the execution time of an arbitrary RA can be classified as being either independent of, or dependent upon, the contents of L. Factors of the first kind could be called structural: they are deter-mined by the processor implementation (e.g. speed of the logic circuits and the widths of thedlta paths). Factors of the second kind are use-related; they are partly determined by -the RA itself and b~ its microprograms (e.g. the intrinsic requirements of -the operator for data movement, and how it is realized wi-th the microprogram), and partly by what else is ~ing on in 1;he processor at the same time. (i.e. what the "environment" of the computa-tion is). Factors of this latter kind are attributed either to storage management, or to I/O
because these are the only ways two RA~s might influence each other with regard to execution times. These use-related factors are again, to some degree, under the control of the ~C programmer: execution times can be decreased by giving pre-ference to operators with more limited requirements for data movement: writing programs that avoid frequent requests for large numbersof additional empty cells in L will decrease the time spent doing storage management,and so on.
The formula given above for the execution time of an RA is simple enough to permit quantitative analysis of algorithms along these lines.
Throughput In order to characterize the capability of the processor in quantitative terms, we shall derive a simple formula for the total amount of computational work performed by the processor in unit time (throughput).
We choose an RA for the duration of a complete processor cycle (states k+l through k+l4) as the unit of computational work, and call it an instruction. As a result, if an RA with type B or C processing requirements goes through m such cycles, it is counted asm instructions executed. It should be emphasized that what we call an instruction here may represent the computational work performed by many instructions of a uniprocessor (e.g. a type A RA might find the sum, product, minimum, maximum, etc. of an arbitrary number of elements, it might perform typical associative processor operations such as a search on an arbitrary number of elements, and so on).

f~ 9 ~

Tlle f`o 11 oWir!'~ ~ormula can be used to depict the most importclrlt f`actors lnfluencing -throughput:
throughpllt = D3~N / (14;~t~- (logN) +K) where N is the number o~ cells in L, D -is -the average number of RA~s per cell of L (thus, D~N gives the total number of RA~s in the processor); D does not depend on N, but it does depend heavily on the nature of the programs currently in L~

t is the average time state changes take to move of T
from one level to another (not including time for bringing in microprograms or fo~ storage management); t does not depend on N, but it is dependent on the speed of the logic circuits used, on the width of data paths, etc.
K is the average time used up in one complete proce-ssor cycle to bring in all microprograms requested, and to complete storage management; if I/O ports always serve subtrees of the same size in T, then K will not depend on N, but it will depend on the nature of the programs currently in L.
2 ~ In the above formula the numerator stands for the total amount of computational work performed by the processor in one complete processor cycle (states k+l through k+l), whereas the denominator stands for the duration of that cycle.
According to the above formula, arbitrarily large throughput values can be reached by increasing the size of the processor : if all othen~parameters remain constant, throughput increases a N/logN. (Of course, it must be added that this processing power can be applied to solving any mix-ture of small and large problems 9 in contrast to a collection of stand-alone uniprocessors which might represent the same numeric throughput value, yet totally lacks the ability to cooperate in solving problems of varying sizes.) Cost-effectiveness In this section we take a brief look at the price one has to ~)ay for the performance of thls processor, and consider some of the îm~portant aspects in a qualitative manner only.
The cellular nature of the processor has several beneficial effects:
(1) The one-time design costs of the cells of T and L would be divi~edup among a large number of such cells making up one or more instance9 of the processor.
(2) Both design and replica-tion costs could be kept down since the amount of hardware contained in a cell of T
or L is rather small, partly because local storage in either kind of cell is a moderate number of registers.
(3) interconnection problems are eased, and pin restrictions are mitigated by the limited nature of inter-faces; any cell of T or L communicates with at most three other cells. Moreover, any subtree of T whose leaves are cells of L communicates with the rest of the processor through at most three such points.
(4) Since only I/0 ports have dedicated functions to perform in such a processor, the modular structure permits ~20 processors of widely varying sizes (but basically of the same design and using the same components) to be contructed, spanning wide ranges of performance.
Questions of hardware utilization inevitably enter considerations of cost-effectiveness. Although all cells of T and L always participate in storage management, and many of them frequently participate in I/0 related activities, beyond that only those contained in active areas do useful work. Less than full utilization of hardware seems unavoidable in this processor, but it is by no means synonymous with 3~ wasting processing power. In fact, in some respects just the opposite seems to be the case: having enough empty cells in L inside and in the vicinity of an RA is a prerequisite for rapid storage management, and therefor empty cells in L

9 ~ a ~

s~lo~ be cor-lsl(lered as a resol~rce whose a~ailability genera~ly s~eeds up processing.
Considering -that -the dominant component in the cost of present-day general purpose computers is the cost of soft-ware, the processor ~just decribed has some attractive proper-ties:
(1) Parts of the usual system software (interpreter for high-leveL language, memory management, etc.) have been replaced by hardware.
(~) Certain functions of present-day multiprocessor system software (detection of parrallelism in source programs, assigning processors to tasks, etc.) have been taken over by hardware.
(3) The programming language to which the processor responds directly and exclusively is expected to facilitate construction of correct programs.
D. SUMMARY
A cellular network of microprocessors is capable of efficiently executing reduction languages. Since the amount of hardware contained in each cell is small, networks composed of large numbers of cells--millions or more--can easily be envisioned.
One can consider the processor a general purpose one, since it acts as an interpreter for a high-level language.
A machine of this type can execute any pro~r-~m written in a reduction language, and its satisfactory operation is not restricted to any particular class of problems. (One may want to consider it a special purpose processor for the very same reason, since it responds to nothing else~t reduction ~G languages.) The role of the reduction language is absolutely vital to the high performance of the processor: the program-mer, by expressing his algorithm in this language, organizes his compu-tation to facilitate efficient execution.
High performances can be attributed to massive ,. .

t ~ ~.3 ~

p.lrl:L]cl; ;m t)ot~rl on Ind below thi~ re-luc-tion ]anguage level.
~ior thi3 Cla5S )~' machirles, the degree of parallelism on the langllagc level ls unboun(led; space permit-ting, all reducible applicatiorls c~arl proceed simultaneously. Parallelism below -the language level is made possible by the assignment of the cells of L to individual symbois of -the source text, and by -the simple linear representation of -the program text. Both levels of parallelism re~Ly on the partitioning of the processor, which is a kind of self-organization: -the processor organizes itself so that its structure will suit the current program text to be processed.
Since operator, operand, and resul-t of a reducible application each may occupy a collection of contiguous cells in L, the processor finds it easy to deal with composite operators (programs) and composite operands (data structures) of differing sizes.
In the design we have presented there is no commitment to any particular set of primiti~es(i.e., to any particular reduction language); only the ability to execute the so-called microprogramming language is hardwired. This gives a great deal of flexibility in using the processor, and makes the amount of ha~ware in a cell independent of the kinds of programs that will be run on t;he processor.
The uses of a tree network T are manifold, It helps us reconcile the structural requiremen~ for cellularity with the functional requirement of efficient, global control over the operation of all cells of L. We say we have efficient global control over all the cells of L because we can influ-ence all of them simultaneously, each one differently, and in ~logN) steps. Besides serving as an interconnection (routing) network, T has another unique ability: partial results can be stored in its nodes, the position of the node expressing many things about the partial result in it, such as which segment of L it applies to, what its relation to other partial results is, and so on.

r~-';ince a dec,ign mllst firs-t be speclfied to a suffic-ierlt degree of detail before ;t can be analy~ed and improved upon, many of t,he details in -this paper were dic-tated by the desire to provide a reasonably simple and clear description.
Many more or less obvious modif'ications can be made to the processor described here, improving execu-tion time efficiency or reducing cell complexity, but complicating the description.
It should be noted that -the present invention relates to a unique processor architecture and a method for efficiently and directly executing applicative languages by the architecture.
Apparatus shown in block diagram form is represent-ative of building blocks of large scale integrated circuitry well known to persons skilled in the art of information handling system processor design and can be implemented by commercially available integrated circuits.
Blocks labelled control and partitioning are implemented by microprogram control circuits and storage of the type are currently availabls for use in microprocessors.
While the present invention has been described with reference to preferred embodiments thereof, it is unders-tood by those skilled in the art that various changes in the mechanism and apparatus may be made without departing from the spirit or scope of the inventions.

Claims

WHAT I CLAIM BY LETTERS PATENT IS:

1. Information handling apparatus, for parallel eval-uation of applicative expressions formed from groups of sub-expressions, said groups being separated by syntactic markers, comprising;
a plurality of interconnected cells, each containing at least one processor, each cell in a first set of said cells containing more than one processor, each of said processors in said cell in said first set being adapted to being connected in a disjoint assembly of processors independent from the other processors in said cell;
Logic means for connecting said processors to form disjoint assemblies of said processors, said logic means being responsive to said syntactic markers to partition said plurality of interconnected cells into one or more disjoint assemblies of said processors, each of said disjoint assemblies being adapted to evaluate a subexpression;
means for entering applicative expressions into said cells and for removing results from said cells after evaluation of said applicative expressions.

2. Apparatus according to Claim 1, wherein each disjoint assembly capable of evaluating a subexpression, comprises more than one processor.

3 Apparatus according to Claim 1 further comprising a second set of cells, each cell containing one or more processors, each cell in said second set of cells being connected to one of said cells in said first set of cells.

4. Apparatus according to Claim 3, wherein each cell of said second set comprises:
means for storing one or more program text symbols and wherein each cell of said first set comprises means for moving and processing information as re-quired by a microprogram control.

Apparatus according to Claim 3, wherein each disjoint assembly comprises at least one processor in a cell of said first set and a plurality of cells of said second set.

6. An information handling system for parallel evaluation of applicative expressions formed from groups of subexpressions, said groups being separated by syntactic markers, comprising;
a tree network of cells, each containing at least one processor, each cell in a first set of said cells containing more than one processor, each of said processors in said cell in said first set being adapted to being connected in disjoint assembly of processors independent from other processors in said cell;
partitioning means within said tree network for separating said tree network and said groups of cells into a plurality of areas.

7 An information handling system according to Claim 6, wherein each cell of said group is adapted to contain at least one symbol of said applicative expression.

8. An information handling system according to Claim 7, wherein each of said cells in said group is connected to the adjacent neighbour cells in said group to form a linear array of cells.

9. An information handling system according to Claim 8, wherein symbols of an applicative expression are stored in said group of cells in an order in which said symbols occur in said applicative expression.

10. An information handling system according to Claim 6, wherein each cell of said tree network comprises one or more processors which are operatively connected into areas under control of said partitioning means, said partitioning means being responsive to syntactic markers.

11. An information handling system according to Claim 10, wherein each cell in said tree network comprises two mor more processors.

12. An information handling system according to Claim 10, wherein said tree network further comprises plural information path interconnections between cells in said tree network to permit partitioning of said network into a plurality of areas.

13. An information handling system according to Claim 6, wherein each cell in said group comprises a processor and a plurality of storage registers for storing characteristics associated with a symbol of an applicative expression to be contained in said cell.

14. A method for parallel evaluation of appli-cative expressions formed from groups of subexpressions in a network of processors, said groups being separated by syntactic markers comprising the steps of;
(a) entering information into a first group of cells each cell containing at least one processor, each of said first group of cells adapted to contain at least one symbol of an applicative expression;
(b) partitioning said network of processors into one or more disjoint assemblies of processors under control of information contained in said first group of cells, each disjoint assembly adapted to evaluate an executable subexpression;
(c) executing an executable subexpression in each disjoint assembly containing an executable sub-expression;
(d) determining whether further executable sub-espressions reside in said first group of cells;
(e) repeating said partitioning step in accordance with information contained in said first group of cells after said executing step if further executable subexpressions reside in said first group of cells;

(f) removing results from said first group of cells.

15. A method according to Claim 14 wherein said step of partitioning said network processors further comprises time multiplexing said processors to form disjoint assemblies of processors in the time domain.

16. A method according to Claim 14, wherein said partitioning step further comprises the steps of:
(a) identifying symbols contained in said first group of cells which are sytactic markers;
(b) determining boundaries of a disjoint assembly of processors from a relative position of a syntactic marker within said first group of cells.

17. A method according to Claim 16, wherein said executing step further comprises the steps of (a) identifying an operator within said executable subexpressions, which operator is comprised of one or more symbols residing in one or more cells of said first group of cells;
(b) identifying an operand within said executable subexpression, which operand is comprised of one or more symbols residing in one or more cells of said first group of cells;
(c)obtaining a microprogram corresponding to said operator;
(d) distributing a portion of said microprogram to each cell of said first group within said disjoint assembly, said portion to be distributed to each cell determined by contents of said cell;
(e) executing said microprogram within said disjoint assembly on said operator and said operand residing in one or more cells of a first group within said disjoint assembly;
(f)placing results of said microprogram execution into predetermined cells in said first group within said disjoint assembly.

18. A method according to Claim 17, further comprising the steps of;
marking, under microprogram control, further sub-expressions of said operator and said operand to permit said operator to selectively operate on different parts of said operator and said operand.

19. A method according to Claim 17, wherein said executing step operates in one of three modes of operation identified as type A, type B and type C operation depending upon the microprogram instruction to be performed.

20. A method according to Claim 19, wherein said step of executing a type A operation further comprises;
(a) executing an operation in a cell of said disjoint assembly of cells independent of operations in other cells of said disjoint assembly of cells;

21. A method according to Claim 19, wherein said step of executing a type B operation further comprises;
(a) executing a microprogram instruction in a cell of said disjoint assembly of cells requiring information transmitted from one or more other cells in said disjoint assembly of cells.

22. A method according to Claim 19, wherein said step of executing a type C operation further comprises;
(a) executing a microprogram instruction in a cell of said disjoint assembly requiring information trans-mitted from one or more other cells of said disjoint assembly;
(b) assigning cells in said group of cells, to accept and store symbols resulting from said executing step such that result symbols are contained in said cells in an order in which said symbols occur in a result expression, wherein said step of assigning cells may require assignment of cells outside boundaries of said disjoint assembly depending upon the length, and form of the said result expression of said executing step.

23. A method according to Claim 22, wherein said step of assigning further comprises the steps of;
determining for each cell whether said cell is oc-cupied ; identifying insertion requests resulting from said executing steps;
reassigning cells based on a number and relative position of insertion requests;
moving,substantially simultaneously, contents of said occupied cells to cells assigned in said reassigning step.