Generating Test Programs from Syntax By W. It. Burkhardt, Camden, N . J . With 18 Figures
(Received October 27, 1966)
Summary. The many faces of programming and systems development demand an immense amount of mechanical routine work. The present paper tries to explain some areas where automation of many tasks may be of great help. One special area, where progress seems to lag behind unduly, can be found in debugging, testing, and diagnosing systems. Here we attempted the generation of programs automatically from a definition of a problem and the characteristics of programs for its solution by a software system, which has been specially designed for this purpose. I~ has been indicated how the ideas underlying this project may be applied successfully to other areas. Zusammenfassung. Bei der Programmierung und SystementwieMung wird zu einem erheblichen Umfang mechanisehe Routinearbeit erfordert. Der vorliegende Beitrag grenzt •erschiedene Gebiete ab, we mehrere Aufgabenl6stmgen automatisiert werden kSnnen. Ein Spezialgebiet, in welchem die Teelmiken auf niedrigem Stand zuriickgeblieben scheinen, finder sich im Analysieren und Auspriifen yon Systemprogrammen. Hier haben wir versucht, Programme dureh ein System automatiseh zu erzeugen, ausgehend yon einer Definition des Problems und den Charakteristiken seiner L6sung. Dieses System war zu diesem Zweck entwickelt worden. ]~'emer wird angedeutet, wie die Ideen, ant denen dieser Beitrag beruht, auf andere Gebiete erfolgreich angewendet werden kSnnen.
I. Introduction The d e v e l o p m e n t of c o m p u t e r p r o g r a m s for use in the solution of problems can be broken down into the following steps: 1. 2. 3. 4. 5.
F o r m u l a t i o n of the problems. F l o w - c h a r t i n g or defining solutions to the problems (programs). Coding the programs. Translating the p r o g r a m s to the absolute m a c h i n e level (if necessary). D e b u g g i n g and testing the programs.
F o r nontrivial cases (large a n d complicated p r o g r a m s a n d difficul~ problems) these steps are v e r y often repeated m a n y times to a c c o m m o d a t e changes which a p p e a r necessary when r u n n i n g a program. I t is obvious t h a t technical progress on a n y step results in high rewards. The present p a p e r is concerned with the debugging a n d testing stage of p r o g r a m development, especially with t r a n s l a t o r systems,
54
IV. H. BURKHARDT:
A. P r o g r a m
debugging
methods
A small private survey corroborated the anachronism of memory dumps as a computer program debugging tool [1], Octal memory dumps were the natural and equivalent debugging method for programs originally written in octal code (in the late 1940's and early 1950's). There can be no question about the usefulness of dumps as debugging tools even now; in many eases they can be the last resort in correcting knotty programming mistakes. B u t the method is, in general, out of date. The most serious disadvantages of machine language debugging are the following: 1. A deluge of useless material (a very few interesting results hidden in a bulk of unwanted numbers). 2. A static picture of the state of the program at the time of the dump request (a location used several times contains only the results of the last used operation). 3. The desired results are given in a rather unreadable form (for example, floating-point numbers in octal, octal locations instead of variable names, etc.). Several projects have been undertaken to ease these difficulties; for example: 1. The P D U M P features of the F O R T R A N II Basic Monitor System permit specification of starting and ending locations for a dump, thus attenuating Problem 1 b y giving a better ratio between useful results and useless numbers. They also simplify the static problem (Problem 2) by permitting short, intermediate dumps of memory sections instead of a less usefull post-mortem dump. 2. The trace routines, advocated by [2] generally solve the static problem but suffer likewise from Problem 1, giving too much undesired material. More valuable are restricted trace routines which edit only while control of a program runs in a predetermined range. 3. Solving Problem 3 does not impose insurmountable difficulties. Octal dumps of previously translated programs can make use of the symbol table from the translation process and of the Mlocation table from the loader. Most on-line trace routines should use at least rudiments of these tables. Now a debugging method is generally proposed, which is more closely related to modern, high-level language programming [3]. This method has been in use for several years and been proven very useful. It is based on the precept that any larger program is going to be translated anyway at least twice, before being fully debugged and running in an acceptable manner. Although programming in dear, high-level languages reduces the amount of checking out required b y reducing the number of mistakes, the logic of the program as a solution to the given problem must, in any ease, be verified. This part is the more important one, especially when using computers in conversational mode, if the debugging task is divided into checkout
Generating Test Programs from Syntax
55
of program logic and checkout of program coding 1. Even programs which require a high degree of hardware utilization efficiency (translators or simulators, for example) usually are created more quickly and conveniently b y programming and debugging in a higher-level language, with subsequent cleanup and hand-optimizing of the low-level object code. Debugging has now two main objectives: 1. To verify the logic of a problem solution (by keeping track of the flow of control in the program). 2. To verify the correctness of the code b y checking the validity of intermediate results. In many instances, both objectives can be combined successfully and are accomplished b y outputs at the important locations in a program. I f the program is in: 1. Assembly language, this is done by calling an output subroutine to write identification of the calling point and the contents of the registers. 2. In high-level language, b y inserting o u t p u t statements for writing an identification and possibly the current values of critical variables. Such output statements have to be inserted into all branches of the program; hence, octal dumps of memory are easily obsoleted, like programming in octal. The difficult part now, in case of an error, is the patching-up of the program in higher-level language. Even many modern translator systems are very deficient in this respect, although separate translation of patches and insertion at load-time does not impose any severe difficulties. B. D e b u g g i n g
and test data
Programs in General Usually a program is debugged with a set of input data for which the results to be obtained b y the program can be verified. This. process is not unique, however. Therefore, it is usually rather simple to find data sets for which a given program will fail. To debug or define the identity of one program completely and unambiguously, several different sets of continuous intermediate results are required. Such sets are sufficient to regenerate a program as demonstrated in [4]. (One set of end results from one data set permits an infinite o1" near-infinite number of different programs.) Some effort has already been spent for the generation of test data for COBOL programs [5]. This test data generator operates in conjunction with the COBOL compiler to test the data facilities defined in a COBOL program.
Software Systems Techniques to test software systems differ remarkably from those which test hardware. In testing programs for hardware, the diagnostics, all functions of a machine are tested, i.e., executed for standard data and i Here the automatic
generation of a flowchart for the program
would be helpful.
56
W.H. BLTRK~ARDT:
the results compared to stored values, which have been established previ.ously. I t seems t h a t the process is different for software systems, because of the huge number of functions performed by a software system. Usually the software system is debugged and tested like an ordinary program with a few, sometimes very large, programs; and not like a system. This seems inadequate and is perhaps the reason why, many years after the release of a system, errors in software systems are still able to come up.
Program Testing The task in program testing is to assure the reliability and correctness of the programs in t h a t they satisfy the given specifications, including those for error detection and correction. Therefore, a complete test procedure would contain the Functional Test, the Marginal Test, and the Error Test. The Functional Test verifies the correct implementation of the specifications. All legitimate possibilities of the design specifications have to be examined, which is an enormous job. For example, in a compiler system, these possibilities consist of all eventual variations of source language statements and sequences of them. To limit this task, reasonable restrictions must be imposed to accomplish this testing within a sensible time and with justifiable effort. The Marginal Test is here defined as one which analyses further possible interpretation of the specifications and considers different programming and operating situations resulting from different hardware configurations. The Error Test cheeks the error detection and automatic correction facilities.
II. Method The method employed in this system is based heavily on the techniques from syntax-directed eompliers [6, 7], but several new definitions are significant. A. D e f i n i t i o n s Programs are considered to consist of basic syntactic units in which the correct connections in the language are defined by phrases. A phrase consists of at least two syntactic units: the defined one, and one or more defining units. Any syntactic unit not further defined is a so-called basic unit (so). So the phrase for the definition of unit sj is s t -~ aj~ s~.
(1)
The operator Gj~ means syntactic summation over i in the phrase j of the syntactic units s~i, which are combined or separated by concatenation or alternativity. Gji is a combination of the operators for concatenation and alternativity and is more general than the operators /~ and A from [8] separately. Recursive and cyclic definitions are named with recursivity of the order n, where the number n gives the number of phrases in the cycle to
Generating Test Progrmns
from
Synt.~x
57
the recursive definition. So the usual definition of recursivity when at least one 8ji in the phrase (Eq. 1) is eq~M to sj is termed recursivity Of first order. If no reeursivity occurs in this phrase but a unit sji is given b y ,%~ -~ Gj~ ~j~,,
(2)
and at least one 8~,~ is equal to 85 fi'om Eq. 11 then recursivity of second order exists. Whenever a unit is recursive without a nonrecursive alternative unit, then a degenerate phrase is present. A shorthand description of reeursivity might be X -+AfBXC, (3) where, of course, B or C may be empty. This description stands for the sequence of all units that are met on the path of the dashed line in Fig. 1, or B B B . . . A . . . CCC. Evidently, the number of syntactic units B and C depends on the number of successive substitutions of X. This is I ~_~ ~, the recursivity number (RN), which is similar to the "degree of embedding" [9]. " In automating programming the most interesting syntactic unit will normally be "program". A program is given in a language for which the set of ! ~x~----~ all the phrases belonging to it, the syntax, is representative. On t:he other hand, an actual program consists o f basic units. Some of them occur more often than others, but always in the correct order Fig. 1. Explanation of Recursivity as defined by the syntax of the language. In programming a problem for a computer, a programmer selects the appropriate units to meet the given specific require ments. So the program p~ for the solution of a problem can be defined b y Pn ~ Sn D Gji 8]i.
(4)
The program p,~ is then given in the language of the set of phrases, Gj~ sji, when D is an algorithm that distributes all further defined syntactic units with their definitions, and S~ is an algorithm (the programmer) to select the required set of syntactic units for the special computation task n. It is interesting to note that the selection algorithm S~ is logically independent of both j and i, which means that each syntactic algorithm can be constructed to handle the syntactic descriptions of all languages of this type and to produce or recognize all possible programs in these languages. The syntax of a programming language can be represented by a tree [10], so the generation of the program can be easily visualized. (See Fig. 13 in Section VI.) B. B a s i s f o r A u t o m a t e d
Programming
~u the selection algorithm for a special problem (S~ in Eq. 4) can be determined, the desired program for the solution of this problem can be generated automatically. The correctness of the selection algorithm determines whether the obtained program will be a correct solution to the
58
IV. H . BURKttARDT :
problem. Since the correct definition of the language will be user the produced programs are naturally correct in respect to the language. In order to define the selection algorithm, it is necessary to investigate the problem to be solved. In the present example for the testing of software systems, the task and requirements can now be defined. C. T h e T a s k To perform the checking in the Functional Test, a program or set of programs ought to cover all possibilities of the language, i.e., of its definition. A set of programs with this quality is identified as a "Basic Set", and it has the characteristics of Generality, Correctness, and Completeness.
1. Generality. The implementation of the idea was planned in such a manner as to be applicable to all different software systems in which the languages or their equivalents can be defined formally. Further, if the formal description is exchanged for that of another language, automatically correct test programs for t h a t language are obtained without any reprogramming of the generating program or parts of it. I f the generating system is written in a higher-level language, it can be used on all machines for which a compiler for this language is available. 2. Correctness. The method uses the definition of the language. Therefore, the generated programs are necessarily correct if the correct description is used. For testing the error detection and recovery facilities in a software system, it is possible to change one or more elements, e.g., in the definition. Because a definition is representative of a language, the now produced programs are a picture of the altered description and can be test programs for the checking of the error detection and recovery facilities. 3. Completeness. The method not only gives some test data, but a complete set of correctly possible elements of the language and the connections between them. This point is important because it sustains confidence in the results and assures t h a t the possibilities of the language are covered completely. Some difficulties arise, however. The definition for a programming language for practical computations must be of such a form as to allow an infinite or near-infinite number of programs. A corresponding number of elements in the produced test programs is to be expected. Therefore, techniques have been included in the method to limit the number but not the succession of the elements. Moreover, the requirements are such t h a t the restrictions imposed can be lifted gradually. D. D e f i n i t i o n o f a B a s i c S e t A Basic Set of programs for a given language exists when each syntactical unit hierarchically contained within the tree of the syntactical unit "program" appears at least once within the set of programs. A Basic Set of a piece of program is defined similarly, except t h a t only a single branch of the total syntax tree is involved.
Generating Test Programs from Syntax
59
E. B a s i s f o r t h e M e t h o d The syntactic definition of a language is used as basis for the method. The reason is that, for the generation of test eases, the proper starting point has to be as close as possible to the definition of the language. I f a language is given by a definition other than a syntactic one, this has to be used, possibly with an algorithm to extract the features of the language to the internal form for the present system.
1. External Description of the Syntax. Several forms of syntactic descriptions are possible. However, the generating system requires a standardized representation as input. (See Fig. 2.) This form without explicit rectaoperators seemed to impose excessive restrictions. So a prepass was constructed which accepts a syntax in several different representations. (See Fig. 3.) DIGIT STATEM.TERMIN. RROGR.TERMIN. OPERATOR ASSIGN.SIGN BRACKET1 BRACKET2 NUMBER SLABEL NAME CONTROL-NAME CONTROL-STATEM. EXPRESSION EXPRESSION-S COMPUToSTATEM, UNLAB~ STATEMENT 9
STATEM.GROUP PROGRAM END OF SYNTAX
I 2 3 EOS E N D
4
5
+
/
*
*
6
7
8
9
*
= ( I DIGIT NUMBER N A M E DIGIT I F CONTROL-NAME BRACKETI EXPRESSION BRACKEI2 LABEL NAME NUMBER EXPRESSION OPERATOR EXPRESSION EXPRESSION EXPRESSION OPERATOR ( EXPRESSION-5 ) NAME A S S I G N . S I G N EXPRESSION-S COMPUT.STATEMo CONTROL-STATEM. LABEL UNLAB.SIAIEM. STATEM.TERMIN UNLAB.STATEMo ST~ TEM.TERMIN. STATEMENT STATEM.GROUP STAIEMENT STATEM.GROUP PROGR.TERMIN,
Fig. 2. Example Language Definition fbr I n p u t to the System
Blanks and blank cards are ignored by the prepass. All representations for concatenation are replaced by one blank, and the separators for alternativity are replaced by two blanks. The brackets < and > from BACKUSNAUR-Form are ignored. The length of the names for the syntactic units is unrestricted for input to the prepass. (A restriction of the names to six characters or one machine word on the 7090, as in SHADOW [11], is only reasonable for very small definitions. Normally the abbreviations are very unintelligible and are troublesome in larger systems.) The prepass, where necessary, cuts the length of the names to eighteen characters. Any newly defined phrase starts in a new line containing the syntactic unit to be defined and the definitions with as m a n y continuation lines as necessary. Further details are given in Section VI, especially with regard to Fig~. 2 and 3.
2. Internal Description of the Syntax. Internal to the generating system, the syntactic units in the phrases are coded by integers (similar to [12]).
60
~V. H . BURKHARDT:
Since the translation list is available, it is Mways possible to change back to the alphanumeric definitions. The translation is done by an administrative program, producing two tables from the input of the syntax: the trans(DIGIT) .= (I)I (2) I ( 3 ) 114) I 15) I (511 17) / ( 6 ) 1 ( 9 ) ., (STATEM*T~RMIN. ) . = (EO$) .. (PROGR. TERMIN. ) . = (E) - ,N) - (D) .J (OPERATOR). = (+)I(-)I(*)/(/)/(~)-(~) .J ,ASSIGN. SIGN ) .= {=) .j ,BRACKET I ) o= (I) .~ ( BRACKET 2 ) . = ')) .~ (NUMBER) .= ,DIGIT) .~ $(LABEL} .= (NUMBER) ., 'NAME) . = IN) - (A) - (M) - (E) - ( D I G I T ) .. [CONTROL - NAME ) .= ( l ) - IF) .~ CONTROL - STAIEM.) . = ,CONTROL-NAME) - (BRACKET ()-(EXPRESS|ON) - (BRACKET 2) ,LABEL) .. (EXPRESSION). = (NAMEII(NUMBER)IIEXPRESSIONI-IOPERATORI-IEXPRESSION-S)..
EXPRESSION-SI.={EXPRESSION}/(EXPRESSION)-(OPERATOR)-,()-IEXPRESSION-S) ()) .= COMPuT. STATEM.)o=(NAME}-(AsSIGN.SIGN) - (EXPRESSION-S) . . -
(UNLAB. STATEN.) o= (COMPUT.STAIEM.) / (CONTROL-STATEM.) STATEMENT).= ( L A B E L ) - I U N L A B . S T A T E H . ) - ( S T A T E M . I E R M I N . ) I ( UNLAB.STATEM. )-,STATEM.TERM|N.) .. ,STATEM.GRoUP).=
(STATEMENT)
( P R 0 G R A M ).=
,$TATEM.
/
($TATEM.GROUP)-,STAIEMENT)
GROUP) -
=s
..
(PROGR.TERMIN.).=
END OF SYNTAx
Fig. 3. Example Language Definition for Prepass NO.
NAME
DEFINIIION
85 84 83 82 81 80 79 78 77
PROGRAM STAIEM.GROUP STATEMENT UNLAB.STATEHo COMPUT.STATEM. EXPRESSION-S EXPRESSION CONTROL-STATEN. CONTROL-NAME NAME SLABEL NUMBER ~RACKET2 BRACKETI ASsIGN.SIGN OPERATOR PROGR,TERMIN. STATEM.TERMIN. DIGIT
84 83 75
75 75 74 73 72 71 70 69 68 67
1
BASIC-UNIT
81 7b 79 76 77 27 39 74 67 30 b2 L3
18 23 65 3 // 2 II 11 // // [/ // //
69 // 82 I/ 71 // // 72 24
~3 //
82
79
70
62
80
74 7g
1/ 73
79 75
70
~0 79
19
38
23
67
II 3g
14 22
11
46
11
51
//
46
/1 9 II B 18 24 55 41 52
4 /1 3 I/ l/ II // 1/ // //
1/ 10 II 9 19 25 36 42 ~3
5
// 11 II lO 20 25 37 43 54
6
t/
7
tt
II 11 21 27 38 45 55 82
O tl II // 11 [I // tl
II 13 22 29 ~9 46 56 b5
58
B4 b8
68
78 80
b9
// 4 II 11 // // 1/ // 11
61
5 11 11 I/ /1 // // II
46
8 7 I1 11 // II II // I1
14 23 30 40 51 57 bb
Fig. &. T a b l e S y n t a x o f E x a m p l e L a n g u a g e
lation and the definition tables. Both are connected by references to the internal numbers.
Generating Test Programs
from
Syntax
61
A small administrative subroutine SYTAB prints out the syntax automatically in a recently published format [13], if desired. Thus, much tedious work and the introduction of many errors into a syntax is obviated in the development of programming languages. The table syntax of the example language is printed out in Fig. 4 with the numbers in the DEFINITION column referring to the translation table (Fig. 5) and using "//" as meta-operator for alternativity. 85 83 81 79
77 75 73 71
69 67 65 6I
ss 56
PROGRAM STATEMENT COMPUT.STATEM. EXPRESSION CONTROL-NAME LABEL BRACKET2 ASSIGN.SIGN PROGR.TERMIN. DIGIT EO$
~
84 82 80 78 76 74 72 70 68 66 52 59
$TATEH,GROUP UNLAB.STATEM. EXPRESSION-S CONTROL-STATEM. NAME NUMBER BRACKETI OPERATOR STATEM.TERMIN. EMPTY ( Z
57
X
W
55
u
54
u
53
T
52 46
S
5i
/
* R P N
45 42 40
$ O 0
H K )
H
38 36 30 27 25
24
F
23
E
22
D
21
C
20 18
B +
tg 14
13 10 8
=
11
A 9
8 6
9 7
7
6
4
5
3
4
2
3
1
2
0
43 41
39 37 35 29 26
L J .
I
G
5
Fig. 5. Translation Table
III. Description of the System A. The G e n e r a t o r P r o g r a m (GEP) The heart of the generating system is the generator program. It is logically divided into five programs: Search Program (SEP), Forwarding Program (FWP), Output Program (OPP), Selection and Discriminator Program (SDP), Completion Test Program (CTP). The completion test program is originally a part of the selection and discriminator program.
62
W. ~l-~.BURKttARDT:
1. Search Program. This program starts from the given syntactical unit and develops the tree of the units which are hierarchically subordinated to this unit (the operator D in Eq. 4). At all branches where basic units are encountered the forwarding program (FWP) is called. The mechanism of the search program consists of running up and down the branches of the tree to the basic units (see Fig. 6). All branch points are stacked when s~P
)
L ........... j
r
Fig. 6. Search Program running down to allow correct continuation in the syntax when running up. Thus, concatenation of syntax units is treated successfully. The selection between the alternatives of the definitions is made by the selection and discriminator program. All syntactic units are marked as they are used t o assist the SDP.
2. Forwarding Program. The F W P picks out the basic units and stores them for retrieval by the output program. In the FWP, translations can be performed from some basic units into others (see Fig. 7). However, this ability is not used in the present model of the program. Whenever the OPP has to be called, control is transferred to t h a t program. 3. Output Program. Here the produced items are translated from internal to external format. To achieve its goal, OPP must distinguish two classes of syntactic units. The members of the first class are all equivalent for OPP and have no semantic meaning; but a few units (like End-Of-
Generating Test Programs from Syntax ~iP
~
63
)
. YES TRANSLATE UNiT
I .......... ? IN OUTPUTLIST
YES
YES
'
Fig. 7. Forwarding Program Statement (EOS), etc.) from the second class need some semantic interpretation in OPP.
4. Selection and Discriminator Program. The most important part of the system is the SDP (see Fig. 8). It works and acts as a programmer to determine which branches of the syntax tree are used b y the search program and, therefore, which basic units are generated. It is an algorithm, S~ in Eq. (4). For the present example, basic sets of programs and pieces of programs are needed. Obviously, several kinds of SDPs are consistent with the goal. In the present model of the program, three different versions of SDP are proposed: Shortest Path Selection (SPS), Sequential Substitution Selection (SSS), Random Substitution Selection (RSS). The different handling features of these programs come into action only after the CTP has shown after the first generation phase that not all syntactic units have already been used. In all subsequent phases a different performance in the three versions takes effect. Common to all three is the handling of recursively defined units. They require special treatment and are therefore stacked separately in SEP. In Fig. 1 the meaning of recursive
64
W.H. BUmCmx~T)T:
definition and the explanation of the recursivity number (RN) was defined. This number is given to the program SDP, and the resulting programs differ according to RN, although all satisfy the requirements of a basic set of programs. In execution, a definition is repeated until RN is exceeded; then the next alternative definition in the phrase is used. Semantic content of a programming language is treated as in [14] (remark the $ sign in front of unit "label") and later generalized by [15].
YrS
DOUSLEFLAG
TAF[ALT~RNATIV~
YEs
GENERATE A RANDOm4 NU,~ER AND TAKE F~ESPECT~VE UNIT IN pt4RASF
Fig. 8. Selection and Discriminator Program
Shortest Path Selections. This version selects, from several alternatives for development, t h a t syntactic unit under which there is still an unused unit available, starting with the shortest path. In the example A -,B[CID[E, where there are still unused syntactic units in the branches under C and D, SPS takes first alternative C after B. As soon as all alternatives in the phrase and in the subtrees are used and the phrase is reached again, the first alternative is always chosen (unit B in the example). With this version quick surveys are obtained of the effects on programs and parts of programs resulting from changes in the definition.
Sequential Substitution Selection. Using the same example as above, the version SSS first takes alternative B. Upon reaching the same phrase again, it selects the second alternative, C. At the third time, the next free
Generating Test Programs from Syntax
65
unit, D, is chosen, and so on. When all units in a phrase have been used and the phrase is reached again, the cycle starts again at the first alternative. With this version, very different representations of programs are generated. Better use of the power of SSS can be achieved by specifying a repetition number (RP), which gradually lifts the restriction on the number of permutations. This repetition number (RP), if given, defines how often the starting syntactic unit has to be used. SYCH A
)
YES
YES
ORECOiRDINAT[ S
~,VSYNTAX RITABLE TTRANSLATI E AND TABLEONI
Fig. 9. Syntax Charter
Random Substitution Selection. Similar to SSS is I~SS with the characteristic that RSS selects randomly among the given alternatives whenever a phrase is met again after all alternatives have been used. However, this is allowed to occur only when the completion test program (CTP) has shown that all units in the syntax tree had been reached at least once. Otherwise, the process of generation would need too much time or the resulting programs would not belong to the class of Basic Sets. For achieving m a n y different representations, a repetition number can be specified for this version as well. Computing 2[1
5
66
W . I-I. BURKttARDT:
5. Completion Test Program. The CTP is a BOOLEan function to determine whether all syntactical units hierarchically subjected to a given one have already been used. I t determines when the requirements of generating a Basic Set are satisfied. Because of the close relationship of CTP to the definition of Basic Sets, this program was separated out of the SDP.
(
.....
)
YES
[
~
OUTPU]" FIELD
RAS
NO
FORWAR0 TO NEXT NAME
Fig. 10. Syta,x T~b|e Generator
B. S y n t a x C h a r t e r Using some of the techniques of the program generator, a program for charting the syntax of a given language has been developed. The program for it, SYC, is similar to the search program (SEP). However, SYC produces the chart of ~11 syntactic units as they are met instead of first progressing to the basic units in the syntax tree (see Fig. 9). The task of the charter is carried out in two phases. In the first, the appropriate coordinates are assigned to each syntactical unit, and two tables are formed. In the first table, the coordinates are combined with indicators for the type of connection of the units and associated with their names, coded by numbers. The second table contains the translation dictionary of the names into the internal numbers for the units.
Generating Test Programs from Syntax
67
The second phase of the syntax charter uses the two tables as input. The syntactical units are selected here by their coordinates, arranged in convenient form, and printed out similarly to the printing of flow charts, optionally either with the internal numbers or with the external names. The techniques of printing flow charts are well known [16], so this phase is not described here. More details are included in the examples in SectionVI. The Syntax Table Generator works similarly to the Syntax Charter Program (see Fig. 10). C. A p p l i c a b i l i t y The program generating system with the programs G E P and S u can be applied very widely. The syntax of any language which can be brought to acceptable form by a prepass (if not already stated in that form), is accepted. With G E P it is then possible to generate Basic Sets of programs for that language. The chart program (SYC) is very useful for quick surveys on programming languages or parts of them. It was indispensable in the development of the program generating system. It gives a picture of the hierarchy of any unit and the complexity of the language. The complexity of a language can be defined as a function of the following: The number of nonredundant steps between the highest and the lowest syntactical unit (the depth of a language). The number of branch points in the syntax (which is a function of the depth). The number of recursively defined units. The number of basic units in the language. The last two determine the width of a language.
RESULTSI
,,
Fig. 11. Test Configuration
IV. Use of the System Having obtained a system for generating programs automatically, it is possible to devise a system for checking of systems. The overall configuration and operation m a y have the form described in Fig. 11. 5*
68
W.H. BvgI~ARI)~:
This scheme is more general than a similar one used for process control computer checkout [17]. I t imposes no restrictions on the particular computer implementation. A. I m p l e m e n t a t i o n The test program generator gives a program, which has been generated according to the previously outlined criteria, as input to the simulated system and to the actual system. Here it is necessary to distinguish among several cases. First, the systems can translate the programs and compare the results. This is very advantageous if a standard language system is used in the generation of programs. Then the actual results (Results 2 in Fig. 11) can be compared to previously obtained standard results, which have been modified according to the actual systems configuration and replace the Results 1 in Fig. 11. Second, the systems can translate and execute the generated test programs; standard sets can be used, likewise, for standard languages. The results are compared in the evaluator program for : 1. Object code comparison. 2. I~un-time results (if a program runs in an infinite loop, it is interrupted this too is a comparison criterion). Printout resulting from the comparison may be, by option, all indications for successful completion of a program or its parts, or solely the error printout in case of discrepancies. Previous mention was of the different versions of the SDP part in the system. The shortest path selection version (SPS) is well suited for a quick cheek if all syntactical elements of a language are implemented. The other, more important version -- sequential substitution selection (SSS) -- emphasizes the checking of the correct implementation of possible connections between syntactic units or language elements.
V. Application to other Areas The first application of the system was, naturally, to generate automatically Basic Sets of programs and of parts of programs for the testing of software systems. Besides the use as a pilot model for automated programming, several minor applications are noteworthy. I t can be very useful for detection and correction of ambiguity and errors, e.g., in character and speech recognition systems. I t could be used as a model for generating macro-statements, functions, and programs in higher-level languages. It has already proven its usefulness for research and development in programming languages. VI. Examples For illustrative purposes, an example language was constructed in a subset of a FORTRAN-like language; and therefore the output format is consistent with the requirements of FORTRAN.
1
b
67
76 -
81
74
i
82
5
83
II
84
I
85
-
AND ALTERNATIVITY
UNITS ALREADYDEFINEO (~) BASICUNITS
]
7r
Fig. 13, SyrLtax-Ch~rt of the Ex~mlol~ L~ngu~ge
7L--80
~CONCATENATION
O
0~
Oq
CD
Q
70
W.H. BU:aK~IARDT:
In Fig. 3, the example language is given in a BNF-like encoding for the prepass. The sign tbr is coded as --~
assignment
. -~-
1 ; b < and >
alternativity end of phrase concatenation unit enclosure
/ . , -
-
( and )
Fig. 2 shows the output of the prepass and the syntax in the form as accepted by the system. The picture is self-explanatory. With the programs GEP and SYC a translation table is obtained, as seen in Fig. 5. 85 74 5 9 81 38 13 74 14 46 62 77 62 75 68 23
0 4000 6022 6062 4090 6111 6140 7166 8192 8231 6271 5300 6320 5355 3385 2410
8# 67 6 i0 76 25 80 79 46 79 80 27 79 68 84 39
1000 5000 6032 6072 5090 6121 5151 7176 8202 7245 6285 6300 5335 3361 2396 2421
83 3 7 11 39 67 79 70 51 79 30 24 73 65 83 22
2000 6000 6042 6082 6090 6135 6150 7181 8212 6256 6291 6311 5341 4360 2405 2431
"75 # 8 62 19 71 76 18 46 70 78 72 30 82 69
3000 6012 6052 3091 6101 5141 7154 8180 8222 6265 4302 5321 6340 3376 1411
Fig. 12. Table of the Syntax Chart (Example Language) The results from the first phase of the chart program SYC can be recognized in Figs. 5 and 12. In the table of Fig. 12, the syntax chart of the example language is given in coded form. For the coordinates, one step in the direction of "definition" has the value of 1000; in the direction NAME2=NAME3 IF(4)l ENb NAMET=NAMES+NAMEg-NAMEI~NANE2/NAME3**(4+(NAME5-(~'(NAME7/NAME i8))}) [F{NAME9~6 NAME3=4*~(NAMEB+NAME6-NAMET) IFtS)2 NAME2=NAME~NAME4/NAME5 IF{6)1 END Fig. 14. Generated Resul~ ~om Example Language (SequentiM
Substitution
Selection,
RN
=
3, R P
= 0)
of "concatenation" or "alternativity" the value of one step is 10. So, for example, one unit 23, which is the letter E, is located six steps into "definition" and 12 steps into "concatenation" or "Mternativity". The last digit in the number of coordinates gives the environment of the syntactic
Generating Test P r o g r a m s from S y n t a x
71
unit and determines the geometric figure and the connections in the printing pass, with the following m e a n i n g s : 0 for the recta-linguistic operator "definition", 1 for concatenation to the syntactic unit to the left, 2 for Mternativity in the same respect. NAME2=NAME3 IF(4)! END
NAME7=NAMES§ I**(NAMEg§ [F(NAME4/NAMES)6 NAMES=NAMEg**(I) IF(NAME2+NAME3)7 NAME6=NAME7 IF(8~5 NAME2=NAME3-NAME4*NAMES/(6) |FtNAME/*-NAME8)I END
Fig. 15. G e n e r a t e d R e s u l t for E x a m p l e L a n g u a g e (Sequential Substitution Selection, R N ~ 5, R P = 0) I
0 2
NAME2=NAME3 IF(4)1 END
NAMET=NAMES+NAMEg-NAMEI*NAME2/NAME~*t(4+(NAM~5"|6*(NAM~7/NAME 18))}) IF(NAMEg)6 NAME3=4*t(NAME5+NAME6-NAMEI)
IF{8~2 1
0 0 9
NAME2=NAME~NAME41NAME5 IF(6)| END
NAMEg=NAMEI**NAME2§
IAMEII(2*'INAME~+NAME4-NAMES))}) END NAMEg=NAMEI*NAME2/NAME3**NAME4+NAME5-NAME6 IF(7)8
NAMEI=NAME2*NAME~INAME4**(5)(NAME6-NAM~7*NAME8/{9*)(NAMEI+NAM
|E2-NAME3)))) IF(4)9 6
NAMERsNAMEO*NAMEg/NAME!
END 4
5 2
2
NAMES=NAMEO**NAMET§
|AMEO/(7~*INAMEB+NAMEg'NAMEI)))) IF(2)4 NAMEb=NAMEZ~NAMES/NAME9 IF(1)5 NAME~=NAME4~aNAMES§ IFINAMEBeNAMEg)2 ENU NAME3=NAMs END
Fig. 16. G e n e r a t e d R e s u l t (Sequential S u b s t i t u t i o n Selection, R N = 3, R ~ ~ 5)
The addition of the number 4 to each of these indicates that the syntactic unit was already defined elsewhere in the chart. (For more elaborate pictures with coordinates, as in [18], the coordinates of the defined place can easily be included here.)
72
W.H.
BUmCI-tAItDT:
I f the s y n t a x is printed from Fig. 12 w i t h o u t rearrangement, a picture as in Fig. 13 is obtained. Because of the encoding of the coordinates presently chosen in F O t ~ T B A N , a limit is given for 100 steps in d e p t h and in direction of concatenation and Mternativity. I n Fig. 14, a basic set of programs generated with SSS, recursivity n u m b e r B N = 3 and repetition n u m b e r R P = 0 is listed. B y comparison with Fig. 15, the influence of a higher R N = 5 can be studied, whereas 1
1 2 3
4
NAME2=NAMEd IF(4)I IF(5}| tF(6|I IF(7)I IF(B)1
IF{9)I IF(NAMEI§ IF(NAMEI"~NAMEI)! END ' NAMEI=NAMEI+INAMEI+(NAMEI+(NAMEI+(NAMEI)})} NAMEI=NAMEI NAME|=NAMEI NAMEI=NAMEI NAMEI=NAMEI END
Fig. 17. Generated Result (Shortest Path Selection, RN
-
-
3)
Fig. 16 reveals the power of the repetition n u m b e r (RN = 3, R P = 5 in this ease). Figs. 17 a n d 18 show the results with SPS, R N = 3, and t~N = 5. T h e potentialities of the recursivity and repetition n u m b e r s and the differences in the action between SSS and SPS of the selection and discriminator p r o g r a m S D P are quite obvious b y comparison of the results. T h e dif-
~ "
! 2
NAME2=NAME3 1F(4)I IF(5)I IF(6)I IF{7)I IF(8)I IF(9)|
[F(NAMEI*NAMEI-NAMEI*NAMEIINAMEIttNAMEI+NAMEIII END NAME|=NAMEI~(NAMEI~(NAMEI+(NAMLI+(NAMEI+(NAM~I+INAMEII)))}~ NAME|=NAMEI NAMEI=NAMEI END
Fig. 18. Generated Result (Shortest Path Seleetiort, BN .... 5) ferences do not a p p e a r in the first lines. This results because of the goal of generating the basic set as fast as possible. Although all possible perm u t a t i o n s in the language are n o r m a l l y not generated, this aim can be approached, if necessary, b y choosing the reeursivity and repetition n u m bers appropriately.
Generating Test Programs from Syntax
73
Acknowledgement Some material for this paper was worked out while the author was employed by the Product Test Laboratory of the Data Systems Division of IBM, Poughkeepsie, in 1963 [19]. Thanks are due to the IBM Corporation for release of this material and to Mr. L. E. Mcdonald of Computer Control Company for his critical evaluation of the manuscript.
Relerences [1] MADDEN, D.: Prologue to Sessiou 1 at the FJCC 1965, A F I P S ConL Prec. 27, I I (1966). [2] JACOBY, K. and H. LAYTON: Automation of Program Debugging, Prec. 16th ACM Conference, 12C--2 (19(}1). [3] ORCHARD-HAYS,W.: The Evolution of Programming Systems, Prec. I R E 49, 283--295 (1961). [4] BUl~KHARD%W. It.: The SELF Program Generating System, Centre d'Etudes et Reeherches, La Gauds AM, France, Internal Report., Nov. 1962. [5] SAUDEZ, R. L.: A General Test. Data Generator for COBOL, AFIPS Prec. SJCC ~3, 317--323 (1962). [6] MrLEWSKI, A. T. : A Syntax Defining Notation and Decomposition Algorithm, Centre d'Etudes et Recherehes, La Gaude AM, France, Internal Report, May 1963. [7] C~EATHA~, T. E. and K. SATTLEY: Syntax Directed Compiling, A F I P S Prec. SJCC ~5, 31--57 (1964). [8] BAaNET% M. P. : Conthzued Operation Notation for Symbol Manipulation and Array Processing, Comm. ACM 6, 467--472 (1963). [9] CHO~SXY, N. and G. A. MILLE~: Handbook of Mathema.tica] Psychology, Vol. II, p. 286. New York: Wiley & Sons, Inc. 1963. [10] CHOMSKY, N. and G. A, MILLER: Finite State Languages, Infbrm. Contr. 1~ 91 (1958). [1-1] BA:R,NETT,M. P. and R. P. FU~'RELLE: Syntactic Analysis by Digital Computer, Comm. ACM 10~ 515--525 (1962). [12] I~ONS, E. T.: A Syntax Directed Compiler for ALGOL 60, Comm. ACM 4~ 51--54 (1961). [13] IVEnSON, K. E.: A Method of' Syntax Specification, Comm. ACM 7, 588--589 (1964). [14] SCHORRE, D. V.: A Syntax Oriented Compiler Writing Language, Prec. 19th ACM Conference, D1. 3 (1964)., [15] FELDMAN, J. A.: A Formal Semantics for Programming Languages, I F I P Prec. 65 Congr. Vol. II; p. 435--436 (1966). [16] A~DE~SO~, H. E.: Automated Plotting of Flowcharts on a Small C6mputer, Comm. A C M 8, 38 (1965). [17] KILGOI~E, G. L. and R. E. HOHMEYER,: A New Technique for Testing of Prograins for Process Control Computers, 1965 I E E E International Convention Record 13, 256--260 (1965). [18] TaYLom W., L. T u s h E s , and R. W~YC~OFF: A Sy~ltaetical Chart of ALGOL 60, Comm. ACM 4~ 393 (1961). [19] BURKHARDT,W. I-I.: Automatic Generation of a Basic Set of Programs, Report P 1124, IBM Company, Poughkeepsie, March 1964. Dr. Walter H. Burkhardt Computer Control Company, Inc. Framingham, Massachusetts, U.5'.A. Present address: Radio Corporation o] America, Electronic Data P~'ocessing Division, Cherry Hill, Camden, New Jersey, U.S.A.