MODULAR ASYNCHRONOUS OPEN-ENDED SYSTEM Yu. L. Vishnevskii, V. E. Kotov, and A. G. Marchuk
UDC 51:681.3.06
INTRODUCTION Developing the next generation of computers is a difficult challenge. The accomplishment of technology today allows substantial improvements of the specifications of individual computer components and peripheral devices. There is a contradiction between the advanced decisions in hardware design and obsolete system architectonics, programming languages, operation systems, programming systems, program and hardware design, strategies, and the level and methods of user--system interactions. A broadening of the applications spectrum of computer technology, its improved integrated performance and utilization efficiency calls for scientific analysis of the set of related problems of architectonics, programming, and system design strategies. In [l], a concept of modular asynchronous open-ended systems (MARS) has been offered, and the major trends in this research outlined. The main way to heighten productivity in a MARS system is by paralleling the program execution at all levels and phases as implemented in the hierarchical paralleling of the computer system and the processors in it. The system is represented as a hierarchy of systems, each paralleled into a relatively small number of component subsystems. The process structure has a similar design. In a hierarchical system, an individual device (or subsystem)that has its local tools for processing, data storage, local commutation and autonomic controls becomes an independent, functionally complete computational module. This does not imply similarity of the modules. On the contrary, the subsystems and components must be heterogeneous and functionally specialized. This calls for a more detailed analysis. Expanding the application ranges of computers and the increased complexity of problems they handle have resulted in making the generalpurpose computers so to speak "equally inefficient" with respect to all problems. While being capable of implementing any algorithm, a general-purpose computer is incapable of competing with specialized processors as far as specific (and generally broad) problem classes are concerned. Such specialized systems are at present concentratedon processing of files and development of control devices and instruments, interpretation of high-level languages, and artificial intelligence problems, but their application areas will certainly expand. Even today, a specialized processor is at least ten times less expensive than its general counterpart of a similar capacity. This approach calls for a thorough analysis of problems so as to identify the basic classes with substantially different structures of data being processed, the operations, the control structure types, and the quantitative features of the computational process elements causing major differences. Developing the data and algorithm specifications for these areas will allow designing problem and system software in an organized way, subsequently approaching implementation of special processors, and building these into uniform systems. Improved communication with computers, as well as other conventional and nonconventional computer applications, presupposes enhanced performance of the coupling modules and their specialization. High performance in processing specialized data structures is indispensable for on-line graphics, image processing, pattern recognition, logical inference, and other subsystems of the future powerful computer complexes. Developing a computer system from specialized hardware and software components requires a new approach to their architecture. The architecture of modular asynchronous open~ended systems is viewed as the totality of restrictions imposed on the module structure, operating Translated from Kibernetika, No. 3, pp. 22-29, May-June, 1984. Original article submitred December 15, 1983. 328
0011-4235/84/2003-0328508.50
9 1985 Plenum Publishing Corporation
and interaction dynamics, and the methods of putting modules together into processing configurations. Architectural studies bear a direct relationship to development of processing component structure, the operation system and programming system, and the languages of parallel programming. This approach allows accommodating the reciprocal requirements of software and hardware in a system architecture. Basically, computers are still made up of registers, memories, and valves. The nomenclature of basic devices such as asynchronous channel, delay unit, direct and inverse commutator, control module and device, arbitrator, etc., developed in designing the parallel processor [2], and the methods used to bring these components together simplify the embodiment of algorithms in hardware, while retaining a sparing strategy of realization. The model of parallel computations , which is inherent in system architecture, is implemented as the base language [3] - the project's conceptual language. It has well-developed means of parallel and asynchronous data processing at all levels -- operations, operators, and modules. The important features of the language are: its representation as a set of four sublanguages (those of control, statement, memory, and structural object operations); unification of the sublanguage grammatical structures; use of Petrie network algebras and the algebra of complexes to formalize the control semantics and operation with structural objects; and the mechanism for leakingof operations and functions applied to structural objects. The connecting and controlling component of the MARS system is its operation system (OS). The modular principle and parallelism of hardware and the absence of a unified system of commands place specific requirements to OS including the transferability and adjustability of the configuration for a hardware or software implementation of the individual system components. The operation system consists of the core and programm{ng system. The core involves implementation of standard means for handling structural objects, hardware, and configuration control, reliability safeguards, resource management, and process control. The programming system covers the specifics of the hardware and the configuration with a standard interface. As the logic of hardware and software of systemsbecomes more complex, the design is becoming increasingly expensive. Hardware support to 0S implementation and the system for automated hardware design become essential. The first objective is served by the language of functional architecture description, POLYaR [4], which is a language of parallel programming. It has flexible means for handling complex data structures, asynchronic progrmmming tools, and powerful hardware capacity with an orientation for major program complexesl The automation of hardware design requires a multilevel system comprising languages of logical design of various levels and tools for mapping the descriptions into languages of a lower level down to the level of basic circuits. An important feature of the system is the existence of tools for device adjustment based on their descriptions and simulation. The MARS system is an open-ended system of compatible functional modules, i.e., programs, microprograms, and hardware combined with a common 0S and capable of static and dynamic reconfiguration and adaptation to applications in conventional and nonconventional areas. The system features parallelism and asynchronic operation of modules, high level of user communication, and a high degree of order in specifications of modules, interfaces, languages, software packages, operational environments, and hardware and software development tools. ].
ARCHITECTURAL PRINCIPLES
The basic tendency of computer architecture is paralleling and combining different processes while also increasing the number of devices. Even today, a conventional architecture features a degree of parallelism measured by ]02-|03 microfunctions performed simultaneously (usually in a common flow of instructions). The hardware complexity is such that attempts at increasing the parallelism of computational processes now causes problems that cannot be solved by the usual empiric or intuitive methods. Furthermore, continued paralleling of processes and reproduction of devices poses the problem of co~autation and exchange. The performance of computers is less limited now by component speed than by delays in communication lines and commutators. A transition to restraints to linkage topology (speed and similar principles) either fails to resolve the overload of communication lines or reduces the class of computation processes that can be effectively implemented by the machine. The natural way out of this situation is a hierarchical paralleling of the computer system and the processes occurring in it. The system is 329
presented as a hierarchy of subsystems, each paralleled into a relatively small number of component subsystems. Any hierarchically organized system needs a distribution of resources among its components, localization of some of the connections, and decentralized control. We arrive at the important principle: partial resource decentralization (primarily data processing and storage) with decentralized communications and control. In its own control consist
a decentralized hierarchical system, an individual device (or a subsystem) which has means for data processing and storage, a localcommutation facility, and autonomous becomes an independent, functionally complete computational module, which can in turn of submodules aimed at the functions mentioned earlier.
The general system structure has a regular outline, but its component modules can be specialized or attuned to the performance of specific functions in the system. This provides an opportunity for using software-hardware modules to perform those functions of the operational system and general system support that cannot be embodied in integrated circuits. In particular, at higher hierarchical levels, special modules for peripheral control, translation modules, and modules for systemic data bases can be included. Instead of the processes mentioned earlier, a large number of processing stages are identified and the degree of their combination is increased. The specialization of lower-level modules makes it possible to organize software-hardware "macroprogramming," and thus to raise the level of the machine language. (Apart from paralleling, this is the major trend in the development of computer architecture.) Finally, including modules aimed at the implementation of special algorithms (Fourier transforms, associative search, etc.) and at various data types (symbolic information, vectors, matrices, lists, trees, etc.) will improve the system performance for individual problem classes. The functional heterogeneity of the system combined with specialization ofthemodules will thus improve both the computational and programming efficiency. The different modules, especially at higher hierarchical levels, can be built up of submodules belonging to the same standard set. In addition, a static system reconfiguration can be done by replacing software modules with their hardware embodiments. This will make the system open-ended in the sense that its configuration properties and capabilities could be selected and modified depending on the application area and operation mode. The openended system and the regular pattern of the general structure, the modular principle, and standard interface between modules make it possible to proceed to a "large-block" computer system design, greatly simplifying the development, design, and user adaptation and modification to operational environment, making the process much more cost-effective. This approach offers improved prospects for overall system development. The main architectural principles -- the hierarchical paralleling and decentralization, a higher level of machine language, the modular structure, the system heterogeneity and module centralization, and the open-ended nature -- are all practiced in the existing computers, although to varying extents. The feasibility of architectural principles proposed in the MARS system is ensured by unified tools of system (and software) composition from modules and the use of a common asynchronous organization of control in the system and programs. The mechanisms of asynchronous organization of calculations have been studied and developed with the use of formal models of parallel computation; currently they are being elaborated on the basis of a parallel programming language which constitutes the conceptual base of the MARS architecture. The sequence of MARS architecture development is the following: computational module--programming languages--sys tern architecture. The computational model represents a parallel program, process, or system as a hierarchy of modules (software modules, process modules, device modules). The modules of different hierarchical levels normally have different control, processing, memory, and commutation mechanisms because of different requirements and capabilities of effective realization at the different levels. But the mechanisms have a common base, which is a common principle of their design and operation. Three main levels of hierarchy are identified: composite modules, simple modules (operators), and operations. The control mechanisms which have not been adequately developed in the existing methods of organization of calculations are given the most attention in the project. Most existing programming systems and languages use sequential or sequential-parallel calculation principles, where sequential control is supplemented by parallel mechanisms. This model proceeds
330
,
1
~! [Arithme~cl I Logical I ~[. [processo r ] pr~c~sor [
_i_
I _!_
[ ~ciai" I ] processo~ [
}Communi-[~complex ~ iData b a s e ~ : ~ ~ation k' IPr~176 ~ .... Fig. 1 from an inverse principle -- asynchronous, when all modules are ass,~r,~d to be a priori independent and parallel, and the process of calculation is formed by means of restraintsimposed on the interaction of modules. These restraints are formed in terms of execution readiness conditions associated with each module. Depending on the terms in which these are stated, three "pure" types of asynchronous control are distinguished: (I) flow control, where the readiness condition is met when input data for the module are available; (2) static control, where the readiness depends on program or system events, such as completion of module operation or its start, interruption, etc.; and (3) dynamic control, where 9 the readiness condition is a logical statement depending on the values of the input data. All three types of asynchronous control are used in this model: At the operation level, it is the flow control; at the operator level, a combination of static and dynamic control; and at the level of composite modules, all three types of control. The mechanisms of data exchange between modules are closely linked with control mechanisms. The exchange occurs through the memory, which, at different levels of program and system hierarchy, are organized differently. The common memory is a set of locations accessible to all submodules. Reading out a data element from a location does not destroy the information, while registering a new data item does. The distributed memory is a set of channels linking the imputs and outputs of the submodules directly. Data form queues in channels, with each reading operation reducing the queue by one element and each recording placing the new data at the end of the queue. 2.
THE STRUCTURE OF MARS SYSTEM
The system configuration alternative being developed is a computer system oriented for application in computing Centers specializing in scientific research. The system comprises a series of specialized processors operating on a general field of main memory and each having its own large working memory. The processors interact with the memory and among themselves through highly efficient data transmission channels. From analysis of the problems currently handled b y c o m p u t e r centers, it is possible, even now, to get a general notion of the types of main processors that are required: a general-purpose arithmetic processor capable of rapidly processing vector and scalar numeric data, a logical or symbolic processor, a data base management process, an OS processor, and a communications processor. Figure ] gives a schematic illustration of the MARS systems. The memory is a multi-input system with a high flow capacity, including a dynamic memory planner, hardware for maintenance of synchronization primitives (traffic switches, messages, control variables) and a restoration system. The multiplicity and complexity of the functions performed by the operational system in modern multiprogramm~ng computer networks requires hardware maintenance. This does not mean that the entire OS is concentrated in a single processor, but, on the contrary, a hierarchical and decentralized OS structure, with localization of the critical data and function components
331
in the processor. The specific quality of the data, operation system, and the operations performed on data should be taken into consideration in the processor design. The OS processor controls the hardware assignments and resources, verifies the data management, and oversees the correspondence between programs and current configurations, as well as system data preservation and recovery. The general-purpose numeric processor [2] is focused on numeric data. Its structure allows high productivity on vector and scalar data. The new architectural ideas are: (I) multilevel structural organization based on a hierarchical microprogra_m~_~ng and allowing, among other things, to interpret statements on the microprogramm{ng level; (2) the processor is divided into subsystems operating in parallel and linked by asynchronous channels; (3) the hardware implementing the simultaneous execution of several microprograms on a common subsystem in the resource sharing mode; and (4) a high-level machine language which has mechanisms for parallel programming. The logical processor is aimed at a broad class of nonarithmetic problems and textual processing. Problems of this class typically have large volumes of complex data structures (usually lists and trees) and a high degree of asynchronous paralleling. The conventional sequential architecture does not allow using the natural capacity for paralleling with these problems, which is understandable in view of the traditional "numeric" orientation of computers. The data base management processor has to map the logical data structure of the system and the user into a physical structure of storage on magnetic carriers and other recording vehicles and to provide access to data on user and system requests. In addition, it is responsible for data preservation, optimum location, accumulation of requests, statistics, and maintenance of data management operation. The co~unication processor is responsible for the interface between the system and external, peripheral, and network subsystems. In particular, its functions include standard interaction protocols with peripheral devices, local and global computer systems, specialized and general interface processors. The problem orientation of this system is determined by the set of special processors, the structure, and the capacity of its individual components. 3.
THE STRucTuRE OF NUMERIC PROCESSOR
The numeric processor under development as an element of the MARS system realizes a new approach to highly effective processors. It is aimed at surmounting the typical shortcomings of the conventional conveyer and matrix architecture. These processors handle large data files of a regular type quite effectively, but lose in performance when scalar arguments or smaller files have to be processed. Another shortcoming is the dependence of their performance on organization of the program and the data structure. The subsystems of a universal numeric processor are: address subsystem, computational memory and control subsystems, and peripheral subsystems which link the processor with the external memory. The processor subsystems are connected by asynchronous channels (AC) which are "queue" memory structures. The subsystem for which a channel is the output channel registers results in it, and the subsystem for which the same channel is the input channel uses its results as arguments. The asynchronous channels serve for asynchronous functioning of the subsystem and smooth the difference in the pace of argument formation and use. Figure 2 illustrates the macrostructure of the processor. Note that organization is based on the flow principle. The flow of program instructions comes from the memory into the control subsystem (CS), where the instructions are interpreted and the macrooperations for the address system and the computing subsystem are formed. The macrooperations initiate the respective microprograms or fragments. The complexity of fragments may vary, from unit operations and up to complex functions such as Fourier transforms, scans, matrix multiplication, formation of complex objects, etc. The address fragments, i.e., the fragments performed in the address subsystem (AS), generate address flows, from which operand flows are formed from the working memory. I n a
332
.,
.
,.,,|
. . . .
Memorysubsystem 1il ,
1
9 .. c ....
~,,o,
l/
J
it. I~1 ~ ~.1-.Ill :] <
. '
,~ I~1 .!~ ,'~ Iit 9 I~
,
/
Controlsubsystem
"I
Fig. 2 typical situation, they go into the computing subsystem (COMPS), where they are processed by computing fragments. As a result, output operand flows are formed, which are registered in the memory according to the record address flows produced by the address fragments. The processor has the following architectural features. Each subsystem consists of several layers. Any of the fragments specified in the instruction can be performed on each layer, and all layers function in parallel using common subsystem resources. The multilayer structure of the subsystem results in an effective loading of its functional devices when performing not only "vector" operations but also scalar operations (or operations with short vectors) (which is provided b y t h e possibility of combined execution of such operations and allows organizing a "cluster" of fragments executed in parallel). As a result, formulas or expressions which are a chain (graph) of fragments can be executed. The arguments of the expression can be either scalar or vector structures. The interaction of fragments occurs through circular asynchronous channels shown in Fig. 2. Some fragments send their results into such a channel and others use these results. The intermediate results of formulas thus are not memorized in the working memory, contributing to higher performance and saving memory space. The organization o f links between the subsystems is illustrated by Fig. 3. The set of channels through which the subsystems exchange data flows is viewed as a fast buffer memory consisting of dynamic registers. Each of the registers is a memorizing structure of the "queue" type. The fragments executed in the system make use of a set of registers as input and output channels. The correspondences of subsystem layers (and therefore the fragments executed on them) and the registers are not defined in a static, invariable manner but are determined dynamically as a fragment is initiated on a particular layer. Importantly, the layer itself is assigned dynamically, taking those which are unoccupied. The dynamic assignment of resources (queue-registers and layers) has important consequences: There is no need to program the resource utilization, and the level of the machine language is thus raised. The notion of "asynchronous channel" needs definition: It is a set ofqueue-reglsters performing t~e static communication function. The types of channels aredistinguished according to this function: The channels linking the memory with the computing subsystem through which the operand flows move from thememory into the COMPS; the circular channels linking the input and output data flows in the subsystems, etc. In one cycle, one operand can be registered in one of the queues of an asynchronous channel, and one operand can be read out of the (same or different) queue at the same time. The dynamic distribution of queue-registers occurs only within a channel. A typical machine instruction has the following structure:
opera,on >~regis~r ~p~
)
It is executed as follows. Among the free registers of the type indicated in this instruction, a register is assigned into which the results of the operation (fragment) will be 333
External ~ Memory I subsysten~-~
| Comuta-__ I Isumysten~-~ ~1
1 lAdd~e~S~
8 ~ I
~I B ~o
"~
.~,.~ . --
Control[_...
.__)
L
v
~ ~
I
I
Fig. 3 placed. The physical number of this register, along with the register type and the data type assigned to the operation results, are recorded at the "top" of the stack of registers. If there is no free register of the required type, the execution of the instruction is delayed until one is available. The physical numbers of the register, used as arguments of the operation, or, more exactly, containing arguments for the operation, are taken from the "top" of the register stack before the number of the result-register is placed into the stack. The data types given with the register number take part in the formation of the operation being executed by supplementing the "operation" field in the instruction. If the operation is a two-place operation, two numbers of argument register are read out of the stack, and the stack top indicator is moved twopositions down. The operation, after it is formed, defines the subsystemand indicates which of the fragments in the subsystem should be activated. The subsystem layeron which the fragment will be executed is assigned dynamically, out of the pool of free layers. An instruction defines a vector operation, i.e., it is assumed that the arguments of the operation are vectors of values of elementary types formed in the input registers. An operation (or a fragment) is performed until there are data left in the input register. In the special case of the input register containing just one value, we have the ordinary operation on scalar arguments. The flows of addresses of input and output arguments of a statement are formed by address instructions. A typicaladdress instruction for the formation of an input flow (reading instruction) has the following structure:
( opera,on ) ( address comtant ) ( reg~t~ ~tm: ) and for writing the resulting flow into the memory,
{ ol:mra~on}( address comtam ) The address constant can have different meanings: It can denote the address of a vari, able or an elementary constant or be the address under which a descriptor of a complex data structure is kept in the working memory. The address instruction allows various types of addressing, namely: -- absolute mathematical (absolute address within the mathematical memory of the problem); -- absolutesystemic (absolute address in the region of the systemic mathematical memory accessible to the problem; -- relative (absolute mathematical or systemic addresses are formed by combination with the basic address in the basic register indicated in the instruction); 334
according to the stack (the absolute address is formed according to the value of the stack top indicator, which specifies the location of the working memory); and - -
indirect, which is used in combination with the foregoing types; first an address is formed which is used to read out of the memory the value that is taken for the execution address. (Such indirect addressing can be done over more than one step.) - -
The usual descriptor mechanism is broadly used. The descriptor of a complex object is not just an indicator of the memory location where the object or its components and data types are to be found, but also defines the selection function, that is, contains the parameters necessary to form a new structure from the original one. Elements of the object can be descriptors, so that complex hierarchical structures can be formed. In the address subsystem of the processor, a set of descriptor registers is kept to carry the initial and current parameter values. At each operation cycle, the addresses of the current elements of complex structures can be formed, and several objects whose elements are used partially can be maintained within the working field. The descriptor register contains the following parameters: base address with structure element type indicators, upper and lower boundaries of the object location in the memory (this information is used to avoid going beyond the object boundaries), values of three indexes, values of boundaries of these indexes, the number of elements of the structure being formed, the current indexing values (three values), the number of structure elements, and the address of the current element formed according to the expression,
A= B + I • i + J•
•
where B is the base address and i, j, and K are the current values of the indexes that depend on the element number. The possibility of a conditional modification of current index values as a function of the values of other indexes allows executing efficient selection functions. A completed construct of the machine language is the operator which is a program object that has the following features and restrictions: - no access to other program objects such as operators or higher level objects; some segments of the instruction sequence can be performed several times and embedded loops are allowed; - -
the operator can use all resources (layers and registers) of the processor but must be programmed so that theseresources are sufficient for its execution; and - -
an operator can represent a program for calculating complex formulas whose arguments are vectors of an indefinite length. - -
The control layer in the CS, which selects from the memory the operator instructions, decodes them, converts them to a form that can be received by control devices of the subsystems, and performs the dynamic resource allocation, forms a conveyer capable of processing one instruction in one cycle, so that if a flow of instructions consists mainly of specifications of operations on scalar arguments, the processor performance will be efficient and will approach the value of one instruction (operation) per cycle. Considering substantial hardware support in the formation of the addresses of the complex structure elements, the processor's performance measured in conventional computer instructions is even higher thanks to a reduction of operations forming addresses, loops, and conditional transfers implementing these selection functions. The processing subsystems (the address and computational subsystems) follow a similar scheme (Fig. 4). The methodology and technology for programming the fragments, subsystem control, fragment memory organization, linkages of devices, control, and adjustment are all handled similarly in these subsystems. The subsystem, considered to be an independent functional module, consists of submodules responsible for the functionsof processing control and memory commutation. The processing functions are performed by functional devices (FD). In order to parallel these functions, the system comprises several FDs, each responsible for certain elementary operations. The composition of an FD in the subsystem and the data structure it handles determine the functional specialization. FDs implementing their functions in several operation cycles have a conveyer organization (the principle of combined execution of operations, common in computer practices). 335
Recording ports
\
~"
Commutator
Il
l
I
Reading ports
I
(
r]
oo
oeo
~
oo
I
Commutation operation
~, Point of entry/~
I Request matrix
ill'
"-'-I
Dynamic resource failure [~ ~ I
L rb'=at~ A9
1 If
-
t
vector
Fig. 4 The functional devices can be linked in various ways. In the processor, the connection between the FDs of the subsystem is based on a high-speed matrix commutator. Analysis has shown that, with a limited number of FDs, this commutation has advantages as regards the speed versus hardware costs. For storage of fragment microprograms, a fragment memory (FM)is included in the system, which is subdivided into modules according to the number of FDs requiring control. Elements of the fragment programs for an FD are located in the respective FM. The instructions for FD a r e placed in the memory compactly, one after another. To determine the time when an instruction should be called, an indicator is u s e d w h i c h is included in each instruction and defines the point in a cycle when the next instruction should be activated. Each FD of the subsystem has its corresponding control module (CM), which calls the current instruction from the tree of instructions according to the input point defined. The chain of instructions or the linear segment is a simple stream of instructions free of conditional transfers or loops. The end of a linear segment is indicated hy the zero value of the indicator and the last instruction. The control module can monitor several linear segments initiated by severalinput points which can be same or different. In a static combination regime, the same linear segments can be executed multiply, or different segments can be executed simultaneously. The linear segment is an elementary microprogram for scalar argument processing. When it is repeated, a vector operation is realized. The sequence of addresses of the entry points for CM is formed by the control device (CD). The sequence of initiation of the individual entry points is determined for the fragment by the process of input data processing. For organizing this sequence, CDs make use, in particular, of conditional and unconditional controls, form transfer addresses, loop parameters, and conditional features, and execute interrupt processing, that is, perform the same operations as a typical control processor. Other features of the subsystem control organization are determined by the layering of the subsystem. As mentioned, a layer can be assigned for the execution of an isolated fragment. Several fragment processors can thus be in an active state in the system at the same time. The subsystem stratification leads to a stratified control --namely, of its part supervising the selection of fragment instructions. The fragment processes make use of common resources. The resource distribution among fragments is accomplished by a special device -the so-called subsystem arbitrator. The memory has a conventional organization.
336
It consists of several modules with random
access. Each module consists of several sections. Operations in the sections can overlap in time, stepping up the servicing of requests to memory modules. A page working memory is used in the processor. LITERATURE CITED 1.
2.
3. 4.
G . I . Marchuk and V. E. Kotov, Modular Asynchronous Developing Systems [in Russian], Novosibirsk (]978), parts 1 and 2. (Preprint of Computing Center of Siberian Branch, Academy of Sciences of the USSR, Nos. 86, 87). Yu. L. Vichnevskii, V. E. Kotov, and A. G. Marchuk, "Architecture and principles of organization of parallel processor for numeric data," in High-Performance Computer Tools for Data Processing [in Russian], Nauka, Novosibirsk (1981), pp. 5-14. A . V . Bystrov, N. N. Dudorov, and V. E. Kotov, "Base language," in Programming Languages and Systems [in Russian], Nauka, Novosibirsk (]979), pp. 85-106. T . I . Lel'chuk and A. G. Marchuk, Language for Description of Functional Architecture of Computer Systems (Model and General Principles) [in Russian], Novosibirsk (]98]). (Preprint of Computing Center, Siberian Branch, Academy of Sciences of the USSR, No. 285).
ORGANIZATION OF COMPUTATIONAL PROCESS IN SOLVING APPLIED PROBLEMS ON A RECURSIVE MULTIMICROPROCESSOR SYSTEM UDC 519.2
V. A. Myasnikov, M. B. Ignat'ev, and V. V. Pil'chakov
One of the aims in developing multimicroprocessor computer systems with recursive organization [i] is to solve scientific and technical problems, including problems of large dimension. The versatility of the architecture facilitates the solving of a mixture of unconnected problems of average complexity in the multiprogrammode. For solving problems with a large time complexity, it is necessary that the computational algorithm should operate in parallel. Therefore it is of interest to specify the requirements towards parallel algorithms intended for computer systems with a given organization, and to carry out numerical experiments on model problems. REQUIREMENTS TOWARDS PARALLEL ALGORITHMS The quality of a parallel algorithm is usually estimated with the aid of the following system of criteria [2]: Sk is the speed-up of computations, Sk = TI/Tk, where T] is the time complexity in processing the algorithm with the aid of one virtual processor, and Tk the time complexity when it is processed by k processors (the duration of each operation is estimated on the basis of a uniform weight criterion); E k is the relative load of the processors (Ek = Sk/k) that specifies the effectiveness of utilization of the computer structure. In constructing a parallel algorithm it is quite possible to increase the number of operations as compared to a sequential algorithm that properly solves the same problem. The above system of criteria does not take into account this circumstance. Indeed, in fact Sk specifies the possibility of parallel execution of an algorithm. If the algorithm has k parallel branches with the same amount of computations, then the speed-up Sk will be maximal, Sk = k. But this criterion does not estimate the decrease in solution time when a parallel algorithm is used as compared to any sequential algorithm that is efficient for a given problem. It is expedient to introduce yet another characteristic of a parallel process that accounts for a change in the solution time for a parallel computational process on a virtual machine as compared to a sequential computational process executed on this same machine. The sequential computational process must efficiently solve the! problem under consideration. This characteristic can be called the effective speed-up Sk and it is defined by the formula
S~
=
TI/Tk,
where T~ is the number of operations of a virtual machine in an efficient sequential algorithm Translated from Kibernetika, No. 3, pp. 30-37, May-June, mitted November 24, 1983.
0011-4235/84/2003-0337508.50
1984.
Original article sub-
9 1985 Plenum Publishing Corporation
337