CA1289669C

CA1289669C - Multiplying unit in a computer system, capable of population counting

Info

Publication number: CA1289669C
Application number: CA000584487A
Authority: CA
Inventors: Shoji Nakatani; Koji Kuroda
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1987-11-30
Filing date: 1988-11-29
Publication date: 1991-09-24
Anticipated expiration: 2008-11-29
Also published as: DE3854321T2; DE3854321D1; AU2599388A; AU598405B2; EP0318957B1; EP0318957A3; EP0318957A2; US4989168A

Abstract

ABSTRACT
Population counting in a computer system is performed by using a multiplying unit in the computer system. The multi-plying unit includes several multiplying sub-units for simultan-eously executing partial multiplication among elements obtained by dividing a multiplicand data and a multiplier data in a regular multiplication mode. In a population counting mode, an input data for the population counting is divided into population counting elements instead of the multiplier data. Population countings for the elements are performed simultaneously by using the multiplying sub-units, producing partial counted data of the population counting elements, and the partial counted data are sent to a pair of carry save and carry propagate adders by which a population counting result for the input data is obtained and output.

Description

~2~396~3 The present invention relates to a multiplying unit in a computer system and particularly to a multiplying unit cap-able of executing a population counting instruction.
Because of the development of a computer system, the data processing performed in the computer system is executed at high speed. Therefore, in the field of graphic displays for example, the variable density of a graph is processed rapidly by the computer system. That is, the variable density is processed at high speed in the cornputer system by counting the number of bits "1" from numerical data, represented by the binary notation, including graphic information. Such counting of the nu~ber of bits "1" is called "population counting" and its instruction is called a "population counting instruction". The present invention relates to the population counting executed in the computer system.
Population counting has typically been performed by a circuit exclusively provided in the computer system. By this dedicated population counting circuit, the number of bits "1" in the numerical data, usually consisting of ~ bytes, can be counted.
~owever, the counting is performed every one byte, so that~it takes a lot of time to count up the bits "1" throughout the numerical data. If only an increase of the coun~ing speed wer~
to be achieved, the counting could be performed by the dedicated circuit, every two bytes or more instead of every one byte. Mow-ever, this is not practically realized because a large quantity of hardware is needed for the dedicated circuit. Thus, the use of ~ : .

~2~9~

the dedicated circuit has a problem that not only the cost Eor the circuit but also the time to perform the population counting increases. The present invention intends to solve this problem by using a multiplying unit provided in the computer system.
In a computer system, particulaely in the recent high speed computer system such as a super computer, a multiply-ing unit performs multiplication at a high speed, every two or more bytes, using a carry save adder (CSA) and a carry propagate adder (CP~) which are well known a~ders used in the multiplying unit in the computer system. Accordingly, if the multiplying unit is allowed to be used to perform the population counting, the counting speed of the population counting can be increased without providing the dedicated population counting circuit. Furthermore, in the computer system, the multiplying unit is generally not used so often and also the population counting is not performed so often. Accordingly, it can be said that the use of the multiply-ing unit for the population counting contributes to the effective-ness of the usage of the multiplying unit rather than disturbing the operation of the computer system.
The use of the multiplying unit has been tried by Shoji ~akatani who is one of the same inventors o~ the present invention, in a laid-open ~apanese Patent Application SHOH
62-20962]. in September 14, 1987. However, accor~Lng to the SHOH
62-209621, there is another problem in that the multiplying unit includes only one multiplying circuit with a spill adder. There-fore, when a multiplier of a multiplying data is divided into a .

~ - 2 - ~

,, . , . , -.

.
.

:: ~,. . .
' '' .~ : .

25307-l plurality of elements, the multiplication must be repeated in the multiplying circuit by the number of the elements. For example~
when the multiplier consists of 8 bytes and the multiplier is divided into 4 elements, the multiplication must be repeated four times; in this case, a multiplicand of the multiplying data is not divided. Furthermore, the spill adder is needed for compensa-ting for lower digits which appear during the repetition of the multiplication, so as to be carried up to the final multiplying results.
Generally, the multiplying unit has two types, a first type and a second type. The first type of multiplying unit is a type whose size is considered more important than its counting speed~ so that the first type of multiplying unit usually includes only one multiplying circuit. The multiplying unit ln the SMOH
62-209621 is the first type. The second typé of multiplying unit is a type whose counting speed is considered more important than its size, so that the second type of multiplying unit includes a plurality of multiplying circuits (sub-units) operating in parallel.
Accordingly, there has been a problem in that the SHOH 62-209621 cannot be applied to the second type. The present invention in-tends to solve this problem.

An object of the present invention, there~ore, is to increase the executing speed of the population counting instruction given to a computer system including the second type of multiply-ing sub-units.
Another object of the present invention is to decrease ;

,, ~ , : : - . -~ : :

, ~ : ~ , ' ' ., . . ~ .,.

~2~g66~

the quantity of electrical parts for executing the populaticn counting instruction in the computer system.
Still another object of the present invention is to provide a system which performs the population counting, but at ; a lower manufacturing cost than hitherto.
The above objects are accomplished by using the second type of multiplying unit including a plurality of multiply-ing sub-units, particularly using the CSAs and the CPAs provided in each multiplying sub-unit, only adding a few electrical parts such as an adder, selectors and a few logical circuit elements to each multiplying sub-unit.
The multiplication performed in the multiplying unit is usually for executing programs in the computer system. This operating state will be called a "regular multiplication mode"
hereinafter. Howe~er, according to the present invention, the computer system is modified so that the multiplying unit operates for both the regular multiplication and the population counting.
In this case, the operating state of the multiplying unit will be called a "population counting mo~e" hereinafter.
In the population countiny mode, a numerical data for ; the population counting is set to a multiplier register in the second type of multipl~ing unit and divided into a plurality of elements. The division is performed based on the process of the regular multiplication mode; thak is, the division is performed in consideration of a calculating form executed in the regular multiplication mode and the number of the multiplying sub-units :' .
, ~ ; `, ' ;

.
`: `, ~.2~66~ 25307-1 of the second ~ype of multiplying unit. The calculating form is a form ~or multiplying multiplicand and multiplier given to the multiplying unit. Generally, in the second type of multiplying unit, there are several calculating forms. For example, accord-ing to some calculating forms, the multiplication is performed by multiplying the elements obtained by dividing the multiplicand and the multiplier by each other, and according to another cal-culating form, the multiplication is performed by multiplying the multiplicand, which is not divided, and the elements obtained by dividing the multiplier.
After the numerical data for the population counting is divided into a plurality of elements in the multiplier register, the bytes of each element are sent to the multiplying sub-units respectively, and the number of bits "1" in each element is count-ed by a CSA newly provided to the respective multiplying sub-unit for performing the population counting, and a half sum output (~IS) and a half carry output (~C) concerning the number of bits "1" in each element are produced from the newly provided CSA.
These HS and HC are sent to a CSA and a CPA which have been provid-2~ ed for each multiplying sub-unit and added thereby, using the well known Booth's algorithm.
The counted results of the numbers o~ the bits "1" in : respective elements a.re sent from the multiplying sub-units to a common CSA and a common CPA which also have been provided to the second type of multiplying unit, in which the counted results from the multiplying sub-units are added and the final results of the .

.:
~`

.

~2~6~

population counting are output from the common CPA.
As men~ioned above, according to the present inven~ion, since the hardware and the multiplying algortthm of the multiplying sub-units in the second type of multiplying unit can be used effectively in parallel, ~he population counting can be performed at high speed, using less hardware.
The invention may be summarized, according to one aspect, as a multiplylng unit or a computer system, for performing multiplication of multiplicand data and multlplier data in a multiplication mode and for performing population counting of population counting input data in a population counting mode, said multiplying unit comprising: means for dividing the multiplicand data into a plurality of multiplicand elements having a first size in the multipliGatlon mode; dividing means for dividing the multiplier data into a plurality of multipl1er elements having a second .size in the multiplication mode, and for dividing the population counting input data into a plurality of population counting elements oi the second size in the population counting mode; a plurality of multiplying sub-units ~or executing simultaneously partial multiplication among the multiplicand elements and the multiplier elements ~hen in the multiplication mode to produce partial product data, and ~or execUting simultaneou~ly partial population countings for the populatlon counkin~ elements when in the population counting mode to produce partial counted data; and means i'or add.tng the partial product data from said multiplying sub-uniks and outputting a ~ multiplication result of the multiplicand data and the multiplier ; ~ 6 . ' ' ' ' '' ~ ~ ' ' ,', ' ' ' ~.
' " ,' .'~ .
'' ' I ' ' ,' ' . .

~2~9~;~9 data when in the multiplication mode, and for adding the partial counted data from said multiplying sub~units and output~ing a population counting result of the population counting input data ; when in the population counting mode.
According to another aspect, the inv~n~ion provid~s a multiplying unit for a computer system, ~or perfor~ing rnultiplication of multiplicand data and multiplier da~a in a multipllcation mode and for performing population counting of population counting lnput data in a population coun~ing mode, sald multiplying unit comprislng. means for storing the multiplicand data in the multiplication mode; dividlng means for divldlng ~he multiplier data into a plurality of multiplier elemen~s havlng an element ~ize in the multiplication mode, and dividing tbe population coun~ing inpu~ data into a plurality of population counting elements of the ele~ent size when in the population counting mode; a plurali~y of multiplying sub-units for executing simultaneously partial multiplication among the multiplicand data and the multiplier elements when in the multlplication mode to produce partial product data, and for executing si~ultaneously partial population countings ~or the population counting elements - when in the population counting mode to produce partial counted , data; and means for addlnq the partial product data from said multiplying sub-unlts and outputting a multiplication result of the multlplicand data and the multiplier data when in the mu]tipllcation mode, and for adding the partial counted data from said multiplying sub-units and outputting a population counting result of the population countlng input data when in the ~ :
~ .

. ,. . . :

39~6~
25307-~01 population counting mode.
The invention will now be described in greater detail with reference to the accompanying drawings, in which:
Figure 1 is a block diagram of the population counting circuit of the prior art provided in the computer system;
Figure 2 is an input and output data of 8 bytes in the population counting instruction;
Figure 3 is a block diagram of the population eaunting circuit of the prior art;
Figure 4 is a block diagram of a first embodiment o~ the present invention;
Figure S is a block diagram illustrating the population counting mode in the first embodiment;
Figure 6 is a circuit diagram of the first sele~tor and a part of the multiple generator;
Figure 7 is a circuit diagram of the 6econd selector and a part of the multiple generator;
Figure 8 is a schematic chart illustrating a method of addition in the sub-unit for population counting;
Figure 9 is a schematic chart illustrating a method of addition of the number oi "1" outputs ~rom the four multiplying units in the ~irst embodiment;

7a , .

;: .
.
;

~2~3966~

Figure 10 is a block diagram of a second embodiment of the present invention;
Figure 11 is a schematic chart illustrating a method of addition in the sub-unit to obtain the full sum and full carry;
: and Figure 12 is a sche~.atic chart illustrating a method of addition of the number of "1" outputs f.rom the four multiplying units in the second embodiment.
. Before describing the specific embodiments of the present invention, the prior art dedicated population counting cir-cuit and the first type of multiplying unit capable of performing the population counting in the prior art will be brlefly explained in reference to Figures 1 to 3.
:~ Figure 1 is a block diagram of the dedicated popula-tion counting circuit of the prior art provided in the computer system. In the dedicated population counting circuit, the population counting is performed as follows: a numerical data, consisting of 8 bytes of binary notation, for the population count-ing is given to a register (REG) 50; the 8-byte data is transferred to a REG 52 through a selector 51, and a byte at the lowest unit of the 8-by.te data in the REG 52, which will be called the lowest byte in the REG 52 hereinafter, is sent to an operation circui.t 53 in which the number of bits "1" is counted and converted to a binary numeral and sent to a CPA 54, in the CPA 54, the number of bits "1" in the lowest byte in the REG 52 is counted and stocked in an intermediate REG 55; during the abo~e step, the 8-byte data .

:

.

:- .

966~

in the REG 52 is shifted to the right so that the next byte looking from the lowest byte treated in the above step is set at the lowest byte position; then the number of bits "1" in the next lowest byte is counted by the same process as before and the counted result for the next byte is added to the counted result for the next lowest byte; in the CPA 54, the count of the number of bit "1" in respective byte is repeated every byte and these counted results are added and the output from the CPA 54 is sent to a result REG 56 from which the number of bits "1" in ~-byte data is output as shown in Figure 2. Thus, in the prior art dedicated population counter circuit, the co~mt of bits "1"
has been performed by repeating the counting of the bit "1" number : in one by~e eight times, which results in wasting a lot of time.
This counting could be performed every two or more bytes; however it is impossible to realize this from a ~iew point of the costs : for the hardware.
To solve the above problem of the dedicated population counting circuit, the usage of the multiplying unit for the population counting has been tried as shown in Figure 3. However, in this trial, the multiplying unit is the first type of multi-plying unit including only one multiplying , . ........................ . .

, .

;
, 966~

circuit with a spill adder, consequently there remalns a problem of counting speed as stated below.
When the firs~ type multiplying unit shown in Fig. 3 opera~es in the regular multiplication mode, the first type multiplying unit operates as follows: a multiplying numerical data including the multiplicand and the multiplier, each ~onsisting of 8 bytes of binary notation for example, is set in a vector REG
(VR~ 1; the multlpliaand in the VR 1 is trans~erred to a multiplicand REG (CAND REG) 2a through a REG la; the multiplier in the VR 1 is set to a REG lb and divided into four elements each consisting of 2 bytes (16 bits); each element of 2 bytes is sent to a decoder (DCDR) 3 in which the element is decoded into nine kinds of shift control signals, based on well known Booth's algorithm, wherein the nine kinds of shi~t control si~nals wlll be called "decoded signals" hereinafter; the deaoded signals from the DCDR 3 are set or storea to a ~ultiplier REG 2b as the multiplier to the multipliaand set in the CAND REG 2a; the multiplicand in the CAND REG 2a and the decoded signals in the multiplier REG 2b are sent to a multiple generator (MG) 4 in which the multiplicand Z0 is shifted as much as the numerals designated by the decoded signals, this generation in the MG 4 is called multiple yeneration; the shi~ted multiplicands produced ~rom the MG ~ are sent to a first CSA (CSA(1)) 50 and a second CSA (CSA(2)) 51 in whiah the shlfted multlplicands are added, producing the intermedlate sum and the intermediate carry o~ the produats of the multipliaand and the element of the multiplier at REGs 6a and 6b xespectively; the above proaess is repeated four times for ;

..

~2~9~9~9 obtaining the products of the multiplicand and the four elemenks;
the outpu~ for the REGs 6a and 6b are sen~ to a first CPA (CPA~1)) 7 in which the four results concerning to four elements are added, produoing the total number of the "l" bits; and the total number, namely a multiplication result, is set to a result REG (ZR) 8.
When the regular multiplication mode is changed to the population counting mode ln the ~irst type multiplying unlt shown in ~ig. 3, the ~irst type multiplying unit performs the population countiny as follows: the numerical data, consistiny of ~ bytes of binary notation, for the population countiny is set to the VR l;
the numerical data for the popula~ion countlng is transferred to the REG lb and divided into four elements ea~h consistlng of 2 bytes; each 2-byte element is selected from the lowest element by a ~irst selector (SEL(l)) lc, provided in ~he flræt type multiplylng ~ -unit. so that the lowest element is sent to a fourth CSA (CSA(4)) 12, provided in the first type multiplying unit, in whtch the number of "1" bit~ in each element is counted, producing a half sum (HS~ of "1" bits in the element and a half carry (HC) produced during the process o~ the ~S, at a HS REG 12b and a HC REG 12a~
respectively; the HS and HC respectively stored in ~he HS;REG 12b and the HC REG 12a are sent to a second selector (SEL(2)) 41, newly provided in the ~irst type multiplying unit, in which the HS and HC
are ~elected so as to be sent to the CSA(1) 50 and CSA(2) 51, suppressing the output of the MG 4 so as not to be sent to the CSA(1) 50; then the numbers of "l"bits in 4 elements are added, using the hardware and the Booth's algorithm of the CSA(1) and the CSA(2) repeatedly four ~imes and also using a spill adder (SPA) 11 ~, 11 ~: :

::

~289669 for compensating raised aarry in the low units omi~ted during the operation of the CSA(1) 50 and CSR(2) 51; and the result of population counting of the given numeral data is output at the ZR
.
As stated above, when the first type multiplying unit is used ~or performing population countin~, the ~SA(1) and CSA(2) are used, repeating as many times as the number of d~vided ele~ents, which results in ~astiny a slynificant amount of tlme in order to count up all "1" b:lts o the numerical data. This wastecl time is substantially reduced so as to be shor~ by using ~he second type multiplying unit which includes a plurality of multiplying sub-units; that is, the population counting can be performed in a short time by using these sub-units.
Embo~ying the present inventlon, two kinds of the second type multiplying units each including four multiplying sub-units will be described ~or explaining a first embodiment and a second embodiment, referring to Figs. 4 to 9 and Figs. 10 to 12, respectively. In each embodiment, the multiplicand and the multiplier conslst of 8 bytes, respectiYely.
In the first embodiment, the second type multiplying unit operates, in the regular multiplication mode, under a calaulatlng ~orm where the multiplicand and the mul~iplier are clivlded into two elements, respectively, so that each element consists o ~ byte~. The multiplicand is divided into an upper multipllcand element ~CU) and a lower multiplicand element (CL) and the multiplier i5 dlvided into an upper mul~iplier element (IU) and a lower mul~iplier element (I~). Then regular ~ .

" '' , 6~

multiplication is performed by multiplying the elements CU x IL, CL x IL, CU x IU and CL x IU, using the four multiplying sub-units respectively, and the multiplied results from the multiplying sub-units are added by a second CSA (CSA~2')) and a second CPA ~CPAt2')).
Fig. 4 shows a block diagram of the second multiplying unit used aq the first embodiment. In Fig 4, the same reference symbol or numeral as in ~'iy. ~ deslynates ~he same function or part as in Fig. 3. In Fig. 4, when the second type multiplying unit operates in the regular multiplication mode, the multiplication of CL x IU, CU x IL, CU x IU and CL x IL is performed by a multiplying sub-unit ~ 101, which is called "sub-unit A 101" hereinafter, sub-unit B 102, sub-unit C 103 and sub-unit D 104 respectively. In the population coun~ing mode, however, the sub-units A and B operate in the population counting mode and sub-unlts C and D operate in the regular multiplication mode. Therefore, only the block diagram o~ a multiplying circuit ~or the sub-unit A and C is shown in Fig. 4, leaving oth~er sub-units B and D blank except for the registers at the input and the output of the sub-units.
Regular multiplication is performed as ~ollows, the numerical data ~or per~orming the multiplication is set or stored in VR l; ~rom the VR 1, the 8-byte multiplicand and the 8-byte multiplier are sent to the multiplicand R~G la and ~he multiplier REG lb, respectively; the multiplicand in the REG la is divided into the CU data and the CL data and the multiplier in the REG lb is dlvided into the IU data and the IL data so that eash element ~, :
:
' 6~9 consists of 4 bytes; the CL data in the REG la and the IU data in the REG lb are set to a REG 2a and a REG 2b in the sub-unit A, respectively; in the sub-unit A, the IU data æet in the REG 2b is sent to a decoder (DCDR) 3 in which the decoded signals obtained from the IU data are produced and sent to a multiple generator (MG) 4; while, the CL data set in the REG 2a is also sent to the MG 4 in which the multiple generation is performed with the CL
data and the decoded signals as to the IU data; the output data ~rom the MG 4 is sent to a flrst CSA (CSA(1')) 5 and a first CPA
(CPA(1')) 6, in which the output data from the MG 4 is added in accordance with the Booth's algorithm, producing the partial product CL x IU at a result R~G 7a; in sub-units B, C and D, the same operation as done in sub-unit A is performed respectively, producing the partial products CU x IL, CU x IU and CL x IL, respectively; these partial products are sent to a second CSA
(CSA(2')) 8 and a second CPA (CPA( 2'~ g where the final result of the regular multiplication is obtained; and the final result 1s output to a result REG 11 through a post shifter lO for ~ normalization. Thus, in the secand multiplying unlt, the regu1ar 20 multiplication can be per~ormed by making the four sub-units operate at the same time, which results in shortening the ; operation time compared with the operating tlme wasted in the first type multiplying unit.
When population counting is performed by the second type ; multiplying unit shown in Fig. 4, the mode of the second type multiplying unit is changed to the population counting mode. In this mode the numeriaal data o~ 8 bytes for the population .
:

' 128966~

counting, ~hich will be called the "input 8-byte data'`
herelnafter, is yiven to the VR 1, and the input 8-byte data is se~ to the REG lb in which the input 8-byte data is equally divided into two elements called an IU data and an IL data, each consisting o~ 4 bytes. The IU da~a is set to REGs 2b and 2f in sub-units A and C respectively, and the IL data is set to the REGs 2d and 2h in sub-units B and D respectively. In sub-uni~ A, the IU data is sent to a third CSA (CSA(3')) 12 ao~posed of sixteen hal~ adders 12-0, 12~ --, 12-14 and 12-L5, by which sixteen HC
signals HC00, HC01, HC02, ---, HC14 and HC15 and slxteen HS
signals HS00, HS0, HS02, ---, HS14 and HS15 are produced and sent to a selector 41 composed o~ a first selector (SEL(l)) 41a and a second selec~or (SEL(2)~ 41b, as shown in Fig. 4 and in Flg. 5 in detail. The se~ected data fro~ the selector 41 is 6ent to the CSA(l'~ 5 having seventeen inputs and six steps of addition circuits. Th0 output from the ~SA(l') 5 is sent to the CPA(l') 6 in which a carry and a sum output ~rom the CSA(l'~ 5 are added.
The results o~ the additlon obtained by the CPA~l') are stored ln the REG 7a.
The REG 2a has a ~unction of outputting multiplicand blt signals and inverted signals of the multiplicand bit signals in the regular multiplication mode The output signals from the REG
2a are shown ln Fig. 5, and in the output signals ~rom the R~G 2a, the plu5 ~ignals such as ^~R2-31 indlaates the regular bit signal at the 31st bit posltion of the REG 2a and the minus signal such as -R2-31 indlcate~ the inverted slgnal to the bit signal ~R2-31.
Fig. S i5 a circuit diagram showlng the alrcult :. . , , . ~.
.
.
. ~ ' :' ; , , ~2~391 6~

connections among the CSA(3 ) 12, the SEL(1) 41a, the SE~(2) 41b, the DCDR 3, the MG 4 and the CSA(1') 5. In Fi~ 5, the same reference symbol or number as in ~ig. 4 desiynates ~he same unit or part as in Fig. 4. The REG 2b, which is not depicted in Fig.
5, has 32 bit~positions for setting the 4-byte IU data, and the bit-signals set in the 32 bit-positions are indicated b~ +R3-0, ~R3-1, ~R3-2, ~ R3-30 and +R3-31. In ~he population counting mode, the bit-~ignals ~R3-~ to ~R3 31 set in the R~G 2b are sent to the CSA(3') 12 including sixteen half adders (HAs) 12-0, 12-1, 12-2, ---, 12-14 and 12-15. Two bit-signals set in bit-positions (of the REG 2b) adjacent ea~h other are sent to one of the sixteen HAs for performing the half addition of the two bit-signals. For example, the bit signals ~R3-O and ~R3-1 set in the bit pos~tion 0 and 1, adjacen~ to each other, in the REG 2b are s~nt to the HA
12-0 in the CSA~3') 12. In ea~h HA, a half sum (HS) ~iynal and a half carr~ (HC) signal are produced, so that 16 pairs of the HS
and HC signals are output from the CSA(3'~ 12 and sent to the SEh~2) 41b and the S~L~1) 41a, respectively. For example, the signals ~HS00 and ~HC00 are output from the HA 12-0 and sent to the SEL(2) 41b and the SEL~1) 41a, respectively, as shown in Fig.
5.
A11 65 decoded signals +Gl-POSl, ~G1-NEGl, ~G1-PO92, ~G1-NEG2, ---, ~G16-PO92, ~G16-NEG2 and +G17-POS1 output from the DCDR 3 are set to 1'0ll, in the population counting mode.
Accordlngly, in the population counting mode, the input slgnals to the MG 4 are all set to "O", so that the output signals from the MG 4 also become "0" as seen from Figs. 6 and 7. Fig. 7 is a block ;, 16 .. :
.

1289~69 diagram illustrating the wiring connection between the MG 4 and the SEL(2) 41b. In Fig. 7, the same reference symbol or number as in Figs. 5 or 6 designates the same unit or signal as in Figs. 5 or 6. As shown in Figs. 6 and 7, the output signals +G2-30; +G3- :~
.~0, ----, +G16-30 and -~G17-30 from the MG 4 are sent to the SEL~l) 41a, the output signals +G2-31, ~G3-31, ---, +G16-31 and +G17-31 from the MG 4 are sent to the S~L(2) 41b, and the other output signals from the MG 4 are directly sent to the CSA(l') 5; wherein, the num~ers 30 and 31 indicate the bit positions, which will be explained later in reference to Fig. 8, in the CSA(l') 5. The output signals, each having the number 30, from the MG 4 are suppressed by AND circuits in the SEL(l) 41a in the population counting mode, so that only the output signals ~HC-00, ~HC-01, : -----, +HC-14 and ~HC-15 from the ~SA~3') 12 are output:from the SEL(l) 41a as the input signals +G2-30-S, ~G3-30-S, -----, +G16-30-S and +G17-30-S to the CSA(l') 5. In the same way, the output signals ea~h having the number 31, from the MG 4 are suppressed by AND circuits iD the SEL(2) 41b in the population counting mode, so that only the ou~put signals +HS-00, ~HS-01, ----, +HS-l~ and +HS-15 from the CSA(3') 12 are outpuk from the SEL(2) 41b as the input ~ignals ~G2-31-S, ~G3-31-S, -----, +G16-31-S and +G17-31-S to the CSA(l't 5-Meanwhile, in the regular multiplication mode, the ou~put signals from the CSA(3') 12 are suppressed at the SEL(l) and the SEL(2), and the output signals from the ~G 4 are sent to the C~A(l') 5 directly and through the 5EL(1) 41a and the SEL(2) 4Ib as seen from Figs. 5, 6 and 7.
:' ~
~ 17 '' ' ' ' ` ` ' ~28~

Again in the populatlon countiny mod~, the signals concerning the HS and HC signals of the IV data are input to the CSA(l') 5 in which the inpu~ signals each having the number 30, for example ~G2-30-S, and the number 31, for example ~G2-31-S, are set at a definite bit position of sixteen hit rows described in ~ig. ~.
Fig. 8 is a chart showincJ schematically a way o~ using ac1dition to obtain the multiplication in the CSA~1'). The chart corresponds a partial product of 4 byte x 4 byte performed in the regular multiplication mode. A total of 32 bits is set in each row, which will be called a "bit row'` hereinafter, in the regular multiplication mode; however, in the population counting mode, a "0" is imposed at all bit posi~ions except the hatched positions because all the input signals to the CSA(l') 5 from the MG ~ are ~ set to "o" in the population counting mode as stated before.
; For example, the input carry signal +G2-30-s ko the CSA(l') 5 is set in the bit row G2 at a bit position corresponding to the 30th bit-position in a 64-bit carry numeral line depicted at the bottom in Fig. 8; the input sum signal ~G2-31-s to the CSA( 1~ ) 5r related to the carry signal ~G2-30-s, i~ set in the bit row G2 at a bi~ positlon corresponding to the 31st bit position in the 64-bit numeral line; the input carry signal +G3-30-s to the C5A(1') 5 is set in the bit row G3 at a bit position corresponding to the 30th position in the 64-bit numeral line; the input sum signal ~-G3-31-s to the CSA(l') 5 is set in the bit row G3 at the 31st bit position in the 64-bit numeral line, and so on.
Accordingly, the carry and sum data respectively set at ~,,. ~ , .

the 30th and 31st positions of ea~h bit row are vertically lined up. As seen from Fig. ~ bit row G1 is not used in the population counting mode.
The bit values of the input su~ and carry signals set in the G2 to G17 bit rows are added in the CSA(1') 5 and CPA~l') 6.
The added result is set in the 26th to 31st bit positions, which are hatched, in the 64-bit numeral line at the bottom of the chart in Flg. 8. The resul~ representæ the number of "1" bits in the ~-byte IU data set in the REG 2b in the sub-unit A. The result is sent to the R~G 7a.
Since the input 8-byte data set in the REG lb is equally divided into two elements, two sub-units are enough to perform the populat1on countin~. Therefore~ in this embodiment, the sub-units A and B are used in the both modes, the population counting mode and the regular multiplication mode, and the other sub-units C and D are used only in the regular multiplication mode.
Accordingly, the hardware and the function of the sub-unit B is same as those of the sub-unit A, and the hardware and the funation of the sub-units C and D are different from that o~ the sub-units A and B.
The sub-units C and D have the same function and~
hardware, ~xcept the multiplicand and the multiplier in the sub-unlt are di~ferent. The sub-unit C has the functian of performing the regular multiplication by multiplying the CU data and the IU
data in the regular multipli~ation mode and producing all bits "0"
in the population counting mode. Therefore, the sub-unit C has the hardware such as a RDG 2e having the same function as the REG

~ :

, ~ .
'' " ' ' : .

~2~ 9 25307-201 a in ~he ~ub-unit A, no CSA(3 ) as the CSA(3 ) 12 in th~ sub-unit A and no SEL as the SEL 41 in the sub-unit A. As mentioned above r since ~he REG 2e has the same function as the R~G 2a in the sub-unit A, from the REG 2e, the regular CU data are output in the regular multiplication mode and all "0' bit signals are output so that all "0" bit signals are ou~put fr~ a REG 7c to the CSA(2'~
in the population counting mode. The block dlagram for the sub-unit C is deplcted in Fly. 4. Since ~he blo~k diaqram ~or the suh-unit D is equal to that for the sub-unit C, the sub-unit D
bloclc diagram is omitted to be depicted in E'ig. 4.
In the sub-unit B, the added result is set at the 26th ~o 31st bit position, which as illustrated ln Fig. 8 are hatched, in the 64-blt numeral line at the bottom o~ the char~ in the population counting mode. Wherein, the IL da~a is sen~ to a REG
2d in the sub-unit B from the REG lb as seen from Fig. 4.
The two results output from sub-units A and B are aclded by the CSA(2') 8 as shown in Fig. 4. The output of the CSA(2') 8 is serlt to the CPA(2') 9 and added therein. The results of the CPA(2') 9 is post-shlfted by the post shi~ter lO and set in t~e REG 11, thus storing the result data to the posltion ~or the upper 8 byte.
Fig. 9 lllustrates the aclding operatlon of the results of the four sub~units, A, B, C and ~, performed by the CSA(2') 8 and the CPA(2') 9. A symbol "R2 CAND" indiaates the multipllcand consisted of the CU data and the CL data ~et to the REG~R2) la, and a symbol "R3 IF.R" indicates the multiplier consi~ted of the IU
data and the IL data set in the RE6(R3) lb. In the regular multiplication mode, the addition of the partial products CLxIL, CUxIL, CLxIU and C~xIU are performed by the CSAt2') 8 and the CPA(2') 9 as shown in Fiy. 9. Wherein, the partial products CLxIL, CUxIL, C:LxIU and CUxI~ are obtained from sub-units D, B, A
and C respeatively. However7 in the population counting mode, the partial products are obtained on~y from the sub-units A and B and furthermore the "1" bit results of the IU data, obtained by the sub-unit A, and those of the IL data, obtained by the sub-unit B
are both in the same bit position as depicted by the hatahed portions in Fig. 9. Therefore, the result of the addition can be obtained by simply adding the hatahed portion lndicated by IL and IU, using the CSA(2') 8 and the CPA(2') 9 as in the regular multipllcation mode. The data inaluded in the upper 8-byte positions læ sent to the R~G 11 ~hrouyh the post SFT 10.
: The exeaution of the population counting instruction is summarized as follows:
: 1~ the input 8-byte data for population counting is s~et in the REG Ib from the VR 1.
2) the upper 4-byte data ~IU data) of the input 8-byte data set in the ~EG lb i~ set in the REG 2b of the sub-unit : A, and the lowqr 4-byte data (IL data) of the input 8-by~e data iD
the REG lb ls æe~ in the REG 2d of the sub-unit B;

3) the divided 4 byte (32 bits) data (IU and IL data) are further divided into 16 palrs of two bits, and 16~bits of sum and aarry signals are obtained by 16 ha~f adders, suppressing the route fro~ the RE~ 2b to the DCDR 3;

4) the output of the half adders is input to the ' ~ ~
; ' ' " ' ~' ' ~' ' ', ~ ' '' ', ';, ' .:, , ,' ' ~2~g6~
~ 5307-201 CSA(1') 5 through the selector ~1;

5) ~he number of "1" bits in the IU data is ob~ained by addition per~ormed by CSA(1') 5 and CPA~1') 6 in the sub-unit A;

6) the number of "1" bits in the IL data is obtained by the same way as in the sub-unit A, in the sub-unit B at the same time;

7) the number of "1" bits in the IU data and in the IL
data are set in the REG 7a in the sub~unit A and the ~EG 7b in the sub-unit B, respectively; and 8) the data in the REGs 7a and 7b is added by CSA(2'~
8 and CPA~2') 9, taking the weight of respective bits into account.
Next, the second embodiment oi the present invention will be explained.
Fig. 10 is a block diagram of the second multiplying unit illustratlng the second embodiment of ~he present lnvention.
The second t~pe multiplying unit includes four multlplying sub-units 16-A, 16-B, 16-C and 16-D each having the same construction.
The ~econd e~bodiment operates differently from that of the ~irst e~bodiment. In Fig. 10, the multiplicand and the multiplier are stored in REGs 14 and 15 respectively, and ~he output o~ the four sub-units are added by a CSA(2'') 17 and CPA(2'') 18 ancl s~nt to a REG 20 khrouyh a post SFT 19. Only the sub-unit 16-A will be explained because the sub-units 16-B, 16-C and 16-D are the same as the sub-unit 16-A in their construction and function.
In the second embodiment, the 8-byte multiplier is ' . . ~ . . .
.

6~
25307-20.l divided into four 2 byte elements which are sen~ to the sub~
units 16-A, 16-B, 16-C and 16-D~ respec~ively. The operation for multiplication and population counting in the sub--unit 16-A is essentially the same as in the sub-unit A of the firs~ embodiment, except the data set to the REG 21 and to ~he ~G 22 is 8 bytes and 2 bytes, respectively.
In the population counting mode, 8-~yte multiplicand stored in the REG 14 is sent to a REG 21 in the sub-unit 16-A and the other three REGs having the same functio~ as the REG 21 in sub-units 16-B, 16-C and 16-D, respectively. Meanwhile, the 8~
byte input data for the population counting is stored in the REG
15 ~ins~ad of the 8-byte multiplier) and equally divided into four elements each consisting of 2-bytes of data for population counking. Each 2-by~e data is sen~ to a REG 22 in the sub-unit 16-A and to the other three REGs, having the same function as R~G
22, in the sub-units 16-B, 16-C and 16-D. The 2-byte da~a set in the REG 22 is sent to a third CSA (CSA(3'')) 27. A hal~ carry (HC) 27a and a half sum (HS) 27b output from the CSA (3'') 27 are sent to a first CSA (CSA(1'')) 25, having nine input terminals and four steps for addition, through a SEL 32. A sum and carry outpu~
from the CSA(1'') 25 are added by a first CPA (CPA(1'')) 26. The result of the addition from th0 CPA(1'') 1~ set in a REG 30-A.
The same operation as in the sub-unit 16-A i5 executed respectively in the sub-units 16-B, 16~C and 16-D slmultaneously.
The four results obtained by the sub-units 16-A, 16-B, 16-C and 16-D are added by a second CSA (CSA(2'')) 17 and a second CPA
(CPA(2'')) 18 to obtain a total result of the 8-byte inpu~ data.

. ~ , , :
, .~ , . . : ' ,: ' . : , . ..
. . .

3L2~

The output ~rom the CPA(2 ~ 18 is set ln a REG 20 through a post shlfter 19.
Eig. :ll is a schematic illustrating a way of addition in the CSA~l' ) 25 in the sub-unit 16-A to obtain the full sum and the full carry. In the sub-uni~ 16-A, the bit signal of carry through a first selector which is a part of the SEL 32 (not depicted in F:ig. 10) ancl the bit signal of sum through a second selec~or which is another part of the selector 32 are input to terminal G2 (which is not depicted) of the CSA(l ) 25 and occupy ~he 48th and 49th bit positions of 64-bit numeral row, respectively. ~he similar ~it signals input to terminal G3 of the CSA(l'') 25 occupy the 50th and 51st bit positions, and so on.
That input to terminal G9 of the CSA(l ) 25 occupy the 62nd and the 63rd bit positions. In Fig. 11, the same addition ln the suh~
units 16-B, 16-C and 16-D are indicated to~ether.
The results of addition of the carry and sum by the CSA(l'') 25 are in the bit positions from 59th ~o 6~rd, as shown at the bottom of the ~hart. In the same way, the bit positions of the data of the carry and sum .in the sub-units 16-~, 16-C and 16-D axe from 43rd to 47th, from 27th to 31st and from 11th ;to 15th respectivel~, as shown at the bottom of the chart in Fig. 11. The full sum and full carry obtained in the CSA~l'') 25 shown at the bottom, are added by the CPA~l'') 26 to obtain the number o~ '`1"
bits present ln the first quarter part o~ the multipller. Then, the data is set in ~he REG 30-A.
Fig. 12 is a schematic illustrating a way ol addition in the CSA(2'') 17 and CPA(2'') 18 in order to obtain the total - ~
~: , ', ' - , , ' , :

.

~21~96~
253~7-201 number of "1" bits present in the multiplier. The data from each of the four REGs 80-A, 30-B, 30-C and 30-D is added as an addition of partial products. The data from the REGs 30-A, 30-B, 30-C and 30-D has a width of 10 bytes. The number of "1" bit present in a quarter of the multiplier stored in ~he REG 15 is set in a group of hatched bits as shown in Fig. 11. Each of the four ~roups of hatched bits are vertically lined up in parallel four rows shifted by 2 hytes a.s shown in Fig. 12. ~s a result, the ~esul~ant data has a width of 16 bytes. Dis~arding the lower 8 bytes, the upper half of the 16 bytes provides 8 bytes of resultant da~a, in which the total number of "1" bits present in the multipli~r is set in the last seven bits.

.
'

Claims

1. A multiplying unit for a computer system, for performing multiplication of multiplicand data and multiplier data in a multiplication mode and for performing population counting of population counting input data in a population counting mode, said multiplying unit comprising: means for dividing the multiplicand data into a plurality of multiplicand elements having a first size in the multiplication mode; dividing means for dividing the multiplier data into a plurality of multiplier elements having a second size in the multiplication mode, and for dividing the population counting input data into a plurality of population counting elements of the second size in the population counting mode; a plurality of multiplying sub-units for executing simultaneously partial multiplication among the multiplicand elements and the multiplier elements when in the multiplication mode to produce partial product data, and for executing simultaneously partial population countings for the population counting elements when in the population counting mode to produce partial counted data; and means for adding the partial product data from said multiplying sub-units and outputting a multiplication result of the multiplicand data and the multiplier data when in the multiplication mode, and for adding the partial counted data from said multiplying sub-units and outputting a population counting result of the population counting input data when in the population counting mode.

2. A multiplying unit according to claim 1, wherein at least one of said plurality of multiplying sub-units is a first type multiplying sub-unit, each said first type multiplying sub-unit comprising: a first sub-unit register, having bit positions, for storing one of the multiplier elements in the multiplication mode, and for storing one of the population counting elements as a corresponding population counting element in the population counting mode; a second carry save adder, operatively connected to said first sub-unit register, for counting the number of "1" bits in the corresponding population counting element stored in said first sub-unit register by simultaneously inputting every pair of bits in the corresponding population counting element, each pair of bits stored in two adjacent bit positions in said first sub-unit register, and for outputting an elemental sum and elemental carry data for each pair of bits of the corresponding population counting element; a second sub-unit register for storing and outputting one of the multiplicand elements in the multiplication mode, and for generating and outputting all "0" bits in the population counting mode; a decoder, operatively connected to said first sub-unit register, for outputting decoded signals required for multiplying the multiplicand element and the multiplier element upon inputting the multiplier element from said first sub-unit register in the multiplication mode, and for generating and outputting all "0"
bits in the population counting mode; a multiple generator, operatively connected to said second sub-unit register and said decoder, for generating and outputting shifted data by combining the multiplicand element from said second sub-unit register and the decoded signals from said decoder in the multiplication mode, and for generating and outputting all "0" bits in the population counting mode; summing means, having a first carry save adder and a first carry propagate adder, for outputting the partial product data of the partial multiplication of the multiplicand element and the multiplier element upon adding the shifted data from said multiple generator in the multiplication mode, and for outputting the partial counted data of the number of "1" bits in the population counting element upon adding the elemental sum and elemental carry data from said second carry save adder in the population counting mode; and selector means for selecting one of the shifted data from said multiple generator and the elemental sum and elemental carry data from said second carry save adder to send to said summing means in accordance with the multiplication mode and the population counting mode, respectively.

3. A multiplying unit according to claim 2, wherein at least one of said plurality of multiplying sub-units is a second type multiplying sub-unit, each said second type multiplying sub-unit comprising: a third sub-unit register for storing and outputting one of the multiplier elements in the multiplication mode; a fourth sub-unit register for setting and outputting one of the multiplicand elements in the multiplication mode, and for generating and outputting all "0" bits in the population counting mode; and a decoder, operatively connected to said third sub-unit register, for outputting decoded signals required for multiplying the multiplicand element and the multiplier element upon inputting the multiplier element from said third sub-unit register in the multiplication mode, and for generating and outputting all "0"
bits in the population counting mode.

4. A multiplying unit for a computer system, for performing multiplication of multiplicand data and multiplier data in a multiplication mode and for performing population counting of population counting input data in a population counting mode, said multiplying unit comprising: means for storing the multiplicand data in the multiplication mode; dividing means for dividing the multiplier data into a plurality of multiplier elements having an element size in the multiplication mode, and dividing the population counting input data into a plurality of population counting elements of the element size when in the population counting mode; a plurality of multiplying sub-units for executing simultaneously partial multiplication among the multiplicand data and the multiplier elements when in the multiplication mode to produce partial product data, and for executing simultaneously partial population countings for the population counting elements when in the population counting mode to produce partial counted data; and means for adding the partial product data from said multiplying sub-units and outputting a multiplication result of the multiplicand data and the multiplier data then in the multiplication mode, and for adding the partial counted data from said multiplying sub-units and outputting a population counting result of the population counting input data when in the population counting mode.

5. A multiplying unit according to claim 4, wherein at least one of said plurality of multiplying sub-units is a first type multiplying sub-unit, each said first type multiplying sub-unit comprising: a first sub-unit register, having bit positions, for storing one of the multiplier elements in the multiplication mode, and for storing one of the population counting elements as a corresponding population counting element in the population counting mode; a second carry save adder, operatively connected to said first sub-unit register, for counting the number of "1" bits in the corresponding population counting element stored in said first sub-unit register by simultaneously inputting every pair of bits in the corresponding population counting element, each pair of bits stored in two adjacent bit positions in said first sub-unit register, and for outputting an elemental sum and elemental carry data for each pair of bits of the corresponding population counting element; a second sub-unit register for storing and outputting the multiplicand data in the multiplication mode, and for generating and outputting all "0" bits in the population counting mode; a decoder, operatively connected to said first sub-unit register, for outputting decoded signals required for multiplying the multiplicand data and the multiplier element upon inputting the multiplier element from said first sub-unit register in the multiplication mode, and for generating and outputting all "0" bits in the population counting mode; a multiple generator, operatively connected to said second sub-unit register and said decoder, for generating and outputting shifted data by combining the multiplicand data from said second sub-unit register and the decoded signals from said decoder in the multiplication mode, and for generating and outputting all "0" bits in the population counting mode; summing means, having a first carry save adder and a first carry propagate adder, for outputting the partial product data of the partial multiplication of the multiplicand data and the multiplier element upon adding the shifted data from said multiple generator in the multiplication mode, and for outputting the partial counted data of the number of "1" bits in the population counting element upon adding the elemental sum and elemental carry data from said second carry save adder in the population counting mode; and selector means for selecting one of the shifted data from said multiple generator and the elemental sum and elemental carry data from said second carry save adder to send to said summing means in accordance with the multiplication mode and the population counting mode, respectively.

6. A multiplying unit according to claim 5, wherein at least one of said plurality of multiplying sub-units is a second type multiplying sub-unit, each said second type multiplying sub-unit comprising: a third sub-unit register for storing and outputting one of the multiplier elements in the multiplication mode; a fourth sub-unit register for storing and outputting the multiplicand data in the multiplication mode, and for generating and outputting all "0" bits in the population counting mode; and a decoder, operatively connected to said third sub-unit register, for outputting decoded signals required for multiplying the multiplicand data and the multiplier element upon inputting the multiplier element from said third sub-unit register in the multiplication mode, and for generating and outputting all "0"
bits in the population counting mode.

7. A multiplying unit according to claim 1, wherein said dividing means divides the multiplier data into a first quantity of multiplier elements when in the multiplication mode, and divides the population counting input data into the first quantity of population counting elements when in the population counting mode.

8. A multiplying unit according to claim 2, wherein said dividing means divides the multiplier data into a first quantity of multiplier elements when in the multiplication mode, and divides the population counting input data into the first quantity of population counting elements when in the population counting mode, and wherein the number of said first type multiplying sub-units equals the first quantity.

9. A multiplying unit according to claim 4, wherein said dividing means divides the multiplier data into a first quantity of multiplier elements when in the multiplication mode, and divides the population counting input data into the first quantity of population counting elements when in the population counting mode.

10. A multiplying unit according to claim 5, wherein said dividing means divides the multiplier data into a first quantity of multiplier elements when in the multiplication mode, and divides the population counting input data into the first quantity of population counting elements when in the population counting mode, and wherein the number of said first type multiplying sub-units equals the first quantity.

11. A multiplying unit according to claim 10, wherein all of said plurality of multiplying sub-units are of the first type multiplying sub-unit.