Towards formal open standards: formalizing a standard’s requirements
Open standardization seems to be very popular among software developers as it simplifies the standard’s adoption by the software engineering. Formal s...
Abstract Open standardization seems to be very popular among software developers as it simplifies the standard’s adoption by the software engineering. Formal specification methods, while very promising, are being adopted slowly as the industry seems to have little motivation to move into this territory. In this paper the authors present (1) the idea of applying formal specification techniques to open standards’ specifications, and (2) an example of a formal specification of the Rich Site Summary (RSS) v2.0 open standard. The authors provide evidence for the advantages of the open standards formal specification over natural language documentations: formal specifications are more concise, less ambiguous, more complete with respect to the original docThis research has been co-financed by the European Union (European Social Fund ESF) and Greek national funds through the Operational Program ”Education and Lifelong Learning” of the National Strategic Reference Framework (NSRF)—Research Funding Program: THALIS.
School of Electrical and Computer Engineering, National Technical University of Athens, Heroon Polytechniou 9, 15780 Zografou, Greece
2
School of Information Sciences, University of Tampere, Kalevantie 4, 33100 Tampere, Finland
3
School of Applied Mathematical and Physical Sciences, National Technical University of Athens, Heroon Polytechniou 9, 15780 Zografou, Greece
umentation and, when using certain kinds of specification languages, executable and reusable as they support module inheritance. The merging of formal specification methods and open standards allows (1) a more concrete standard design; (2) an improved understanding of the environment under design; (3) an enforced certain level of precision into the specification, and also (4) provides software engineers with extended property checking/verification capabilities, especially if they opt to use any algebraic specification language. The authors showcase how the RSS standard can be formally specified using an algebraic specification language and demonstrate how can that be beneficial. Keywords Open standards · Formal methods · Formal specification · Rich Site Summary (RSS)v2.0 protocol · CafeOBJ
1 Introduction All standards are accompanied by a specification document that is available for anyone to see. That document, usually written in natural language format, is an explicit set of requirements—set out by the standard setting organization— that are to be satisfied by any implementation of the standard [10]. They are necessary in order to conform to the standard’s requirements; however, there are some issues with those specification documents: 1. Verbosity: Such documents can be huge in size. For example, the Digital Imaging and Communications in Medicine (DICOM) open standard, a standard for handling, storing, printing, and transmitting information in medical imaging that includes a file format definition and a network communications protocol, provides on its web-
123
K. Barlas et al.
site [4] 18 specification documents that span a total of 4900 pages, with one document taking—on average— 270 pages. 2. Lack of clarity: It is not easy to be precise and unambiguous when expressing terms in a natural language, without resulting in a lengthy and difficult to read document [38]. The understanding of a natural language relies on the fact that the authors and readers of the specification are using the same words for the same concept. This leads to misunderstandings because of the built-in ambiguity of natural language. An excellent example of this concept can be seen in [28], where the authors argue about the ambiguity of language by discussing two signs displayed by an escalator saying: “Shoes must be worn” and “Dogs must be carried”. It can be quite difficult to try and formally interpret those signs. Do I have to carry a dog? What about the shoes that are in my suitcase? Do dogs have to wear shoes too? When it comes to those signs, we can all understand what they imply, but perhaps that is only because we share the context. When it comes to reading the specification of something we are not too familiar with, such problems might occur frequently. Clarity is important when trying to formalize a statement. 3. Requirements amalgamation: Different requirements of the standard may be fused together into one single requirement; since there is no easy way to modularize requirements expressed in natural language, it may be quite difficult to find all related requirements [38]. To discover the consequence of a change, you may have to look at every requirement rather than at just a group of related requirements. It is also possible that one can find one requirement in different parts of the documentation and expressed in different ways due to the over-flexibility of the natural language adding up to the confusion. 4. Requirements confusion: It is not uncommon to observe mix-ups between functional, non-functional requirements, system goals and design information. [38]. While the issues listed here and in Sect. 2.1.2 also apply to non-open standards, this paper only focuses on the open ones as they are the only type of standards that have their specification documents available to the public. Such problems can make requirements specifications that are written in natural language prone to misunderstandings and errors. Most of the times, these errors are only discovered during later phases of the software development process and may then be very expensive to resolve [38]. In this paper we: – Argue about how creating a formal description of an open standard’s specification document (that results in something we call a formal open standard) can help with the four problems introduced before (Sect. 3).
123
– Give an example by providing a formal specification of the RSS v2.0 open standard and check if (and how) this specification technique can be better than the natural language one (Sect. 4).
2 Background material 2.1 Open standards An open standard is a publicly available standard that has various rights for use associated with it, and may also have various properties of how it was designed. This however is a problematic definition since the term “open” is not an easy one to define properly. In fact, there are many different definitions available coming from dictionaries, national IT agencies, Interoperable Delivery of European eGovernment Services to public Administrations, Businesses and Citizens (IDABC), World Trade Organization (WTO), or deriving from governments, from the Organization for the Advancement of Structured Information Standards (OASIS), American National Standards Institute (ANSI), and the list can go on. Some recurring themes can be found though in all those different definitions, and they mainly have to do with the motivation, development and usage of open standards. The motivation can be traced back to encourage interoperability and help popularize new technologies. Development-wise, an open standard is: 1. usually developed by an open process so it is easy for anyone to participate in and 2. open to public input [32,34], 3. not controlled or tied by any specific group or vendor. While there might not be a short-term monetary gain for developing and adhering to open standards, the long-term benefits (such as sustainable development, smooth communication channels and interoperability) make it worth the effort. 2.1.1 Benefits of open standards The benefits from following the principles defined in open standards are summarized below [22,37]: – By adherence to open standards there are fewer chances of being locked in by a specific technology and/or vendor. Since the specifications are known and open, it is always possible to get another party to implement a similar solution adhering to the standards being followed. – It can be easier for systems from different parties or using different technologies to interoperate and communicate with one another. There is, in a sense, an improved data interchange and exchange, while it is not necessary to use
Towards formal open standards: formalizing a standard’s requirements
the same software or software from a particular vendor to read or write data files. Example A big organization requires that all its offices use office software applications that can read and write files using the Open Document format. Each office has the flexibility of using whatever office software is best suited for it and at the same time be able to read, write and exchange documents with other offices of the organization. – Protection against applications going extinct. That is, if the data file format used is proprietary then, should the application become obsolete, the user may have a difficult time converting the data files to another format needed by a new application. However, if the data format follows an open standard and, hence, is known, either the new application will be able to use it as it is or it will be easier to convert the data so that the new application can use it. – The data storage/access issue is even more important for big organizations. Think, for instance, of the needs of governments and police stations nationwide (see e.g., [29]) or national Electronic Health Records ([33]). 2.1.2 Problems of open standards Notwithstanding the foregoing, there are some problems of Open Standards that are worth mentioning; First of all, the whole process of reviewing, rating and implementing a standard takes a considerable amount of resources (time, money, specific knowledge skills and expertise, etc.). Depending on the standards setting organization that is dealing with the development of that particular standard the whole process can take from a few months to many years. Innovation usually happens for business purposes where there is a time constraint. Last but not least, interoperability is not only achieved via open standards. There are many examples of (not open) standards released under Reasonable And NonDiscriminatory terms (RAND) that have posed no barriers to interoperability (e.g., GSM, CD, DVD, MPEG, Wi-Fi, etc.) [22,37].
The algebraic approach of formal specifications focuses on specifying a system in terms of its operations and the relationships between those operations. Types of data are formally specified along with operations on those data types. The implementation details such as the size of representations are quite abstract in nature. In fact, as mentioned in [39], a formal specification describes what the system must do without saying how it is to be done. This abstraction enables us to easily understand a system’s functionality without going through pages of code that describe this implementation [39]. Of course not all properties can be algorithmically proven as Gödel’s incompleteness theorems state [1], so what guarantees that those properties of the formal specifications that are of interest do not belong to the not-provable class is a task left to the specification expert [8]. While formal methods allow for more precision, clarity and provide a property verification backbone, the popularity of them is not that high. Formal methods are mostly used in systems where a mistake would be disastrous [12]. Apparently the software industry is not very willing to change its current process of research and development and adopt new techniques that would require training the current staff to understand or to hire new employees specialized in formal methods [8]. That is not necessarily true; while a formal specification requires investing a lot more effort upfront (taking time to mathematically model the constructs in the early stages of software development), it could be overall more efficient: enforcing a detailed analysis of the requirements reduces errors due to poor understanding or bad system design [38]. Incomplete and inconsistent requirements can be discovered and resolved early on, so designers do not have to go back to earlier stages and redesign the system, before going back to testing. According to [39], “a formal specification can serve as a single, reliable reference point for those who investigate the customer’s needs, those who implement programs to satisfy those needs, those who test the results, and those who write instruction manuals for the system”.
2.2 Formal methods and formal specifications
3 The idea Formal methods [11,21] include techniques based on mathematical rigor and have been used to describe a system, analyze its behavior and assist its design by property verification, usually carried out through rigorous and effective automated reasoning tools. A subcategory of Formal Methods is formal specification and is a specification expressed in a language whose vocabulary, syntax and semantics are formally defined (see e.g., [17]). This “formal definition” means that formal specification languages must be based on mathematical concepts (usually: set theory, boolean and predicate logic, and algebra) whose properties [30] are well understood [38].
The main idea of this paper is the usage of formal methods to create a formal specification for an open standard—a Formal Open Standard. This resulting formal specification is very different in nature than a natural language specification and we believe that the benefits of going down that road outweigh the cost. We advocate that this type of formal specification should accompany an open standard alongside with its release to the public (or even with a small delay, as mentioned in [12]) and that the development of this formal specification should be done along with the development of the standard, for reasons presented in the next section.
123
K. Barlas et al.
For this paper, since we are not part of any standard development process, we had to follow a different path; this paper attempts to formalize an open standard, RSS v2.0, and tries to show how writing such a specification analyses the standard in such a depth that it provides clarity and how the resulting specification can be read and understood, due to the nature of formal methods, in only one way; the correct one. RSS v2.0 is an already established web standard, so we could only create a specification of our understanding of the natural language documentation that the developers released with the protocol [2], which—of course—is not as effective as doing that alongside the standards’ development. We chose to formalize RSS v2.0 in the CafeOBJ algebraic specification language (more about that in Sect. 4.2), but any other formal specification language will work just as well. There are many languages to choose from, each one with its own advantages and disadvantages, and essentially it all comes down to personal preference, but our idea is not bound to any specific specification language. Formalizing open standards can be done in many ways; we are only presenting one in this paper. 3.1 Benefits of the method A formal specification for an open standard can be beneficial in many ways: 1. Fewer ambiguity issues: Specifications written in natural language are informal and usually contain ambiguities. Even if a standard’s natural language specification is written very carefully, no one can ensure that, when an individual reads this specification in order to make use of the standard, his/her understanding of how things should work will match exactly what the designers of the standard had in mind. Also, the context is often assumed in case of a specification done in a natural language format; a medical appliance will assume a medical background which is usually not well defined. Using natural languages to formulate requirements during phases can lead to misinterpreting errors, due to the obvious linguistic ambiguities [9,28]. Applying a degree of formalism eliminates that problem [12], as it makes designers ask the right questions and also improves the level of understanding, a key point in order to achieve a desired level of (software) quality. While a natural language specification of the open standard cannot really be eliminated, as it is rather more natural for humans to start with such a specification, its involvement could be minimized. So, instead of using a natural language specification all the way, from the requirements part of the standard building process, then to the analysis part, later to the building part, then to the implementation part and finally to the installation part,
123
it can be used during the requirements phase and then only use it to supplement the formal version, in the form of accompanying comments. Formalizing that means of communications reduces ambiguity, since mathematical specifications can only be interpreted in one way [17,30]. 2. Verbosity of the specification: Formal specifications of standards are usually significantly more compact than the ones written in natural languages, a property that can be attributed to the nature of mathematics and to the abstract nature of formal specifications. Another factor that reduces the size of a formal specification is the re-usability of modules; a well written specification of a small module can be applicable in other bigger systems as well. 3. More concrete system design: A standard’s interoperability depends on the precision of its requirements. The better the requirements of a standard are specified, the easier it is to make full, correct use of the standard, especially when it is part of an interoperable system. Standards such as file systems, network protocols or operating systems are areas where this method could be even more beneficial as there has been little work in providing a minimum set of requirements among such standards. 4. Property checking and verification: Depending on the kind of formal specification language we are using, the formal specification can be checked for validity (is the standard doing what it is supposed to do) and an implementation of it can be verified. Verification most often makes sure that an implementation of the standard satisfies its specification. For instance, we could verify that an implementation of a standard upholds some properties as expressed in the requirements. Algebraic specification methodologies provide this kind of extensions. Of course the formal specification may not be always executable, so there is transformation needed towards an implementation language. If the transformation is not algorithmically and automatically done with correctness preserving transformations, nothing is gained [16]. The cost of applying formal methods to (parts of) systems is still uncharted territory. Sommerville [38] claims that the cost of doing that is not higher that using conventional methods, it is just differently allocated as more time, effort and money is invested in the early stages of development (the formal specification part mostly) rather than in the latter stages (verification). Bowen and Hinchey [12] claim that while a company might face extra costs when it tries formal methods for the first time, eventually they run just as cheap as conventional methods, if not cheaper. The cost of introducing formal methods to a company’s research and development process is considered high partly because of the need to either hire specialized staff or to train the existing. The authors of this paper have positive
Towards formal open standards: formalizing a standard’s requirements
educational experiences in the process of teaching formal specification to students of a masters course. The teaching approach combined Problem Based Learning (PBL), Problem Focused Education (PFE) and feedback mechanisms in order to monitor the learning and application of formal modeling concepts [41]. Within 3 months the students had reached a level of acquaintance with those concepts that enabled them to understand a protocol by reading its formal specification and write some basic specifications of protocols, as part of their class requirements. One other interesting approach when it comes to educating students in formal methods can be found in [31]. Newcombe et al. claim that Amazon’s engineers reached a useful level in their understanding of formal methods within 3 weeks and that the benefits of using formal methods have been too significant to ignore [35].
3.2 Related work The idea of formalizing open standards is not new; standardsetting organizations like the International Organization for Standardization (ISO), ITU Telecommunication Standardization Sector (ITU-T) and the International Electrotechnical Commission (IEC) have projects (some ongoing) that aim to express (parts of) their standards into a formal way. Reference Model of Open Distributed Processing (RM-ODP), for example, had its architectural semantics formalized in four formal specification languages (LOTOS, ESTELLE, SDL and Z) [25] and later in UML 2 [27]. There is also a guidelines standard that explains such Formal Description Techniques [24,40]. [40] is also interesting as it shows how the same protocol can be described in different specification languages. Another related project is an ISO/ITU-T effort to standardize a definition for conformance, in the context of formal description techniques [26]. Unlike those standards, this paper focuses on a less abstracted, software-oriented standard that is utilized as a web service. In other modeling approaches that we adopted in the past for other standards we used a variety of formal methods; computational models in particular [6,23,42]. One of them, for instance, was computational modeling with the use of context-free grammars (CFGs) in terms of Backus–Naur Form (BNF) for their appropriateness in implementing a new international paper mill efficiency standard. It was a contextual domain (paper mill) meta-requirement to use computational correctness criteria to model and verify timed events [42]. Hence, under certain circumstances and system conditions, system engineers and meta-modelers could and should make it possible to combine both BNF and a formal, computational language and/or a combination of other methods, according the rules of Metamodelling and Method Engineering (ME) [5–7].
4 Test case In this section, we will describe the RSS v2.0 open standard (Sect. 4.1) and specify it using the CafeOBJ (Sect. 4.2) algebraic specification language. Before doing so, we will first need to create a specification of the XML language (Sect. 4.3) in CafeOBJ and then use this specification to continue with the RSS formalization (Sect. 4.4). 4.1 RSS v2.0 Rich Site Summary (RSS) is an extensible markup language (XML) based document format for the syndication of web content so that it can be republished on other sites or downloaded periodically and presented to users. RSS allows users to avoid browsing all of the websites they are interested in manually, and instead subscribe to websites so that all new content is automatically checked and aggregated by their browsers as soon as it is available. RSS feeds can be read using software called an “RSS reader”, “feed reader”, or “aggregator”, which can be web based, desktop based, or mobile device based. While the protocol lost some of its momentum after 2013, when google reader was shut down, it is still a widely used standard that just got shifted to the background of web services (e.g., iOS 9’s news app that uses RSS feeds to deliver stories to the user [36]). The RSS 2.0 specification [2] describes how to create RSS documents. RSS is a dialect of XML so all RSS files must conform to the XML 1.0 specification, as published on the World Wide Web Consortium (W3C) website. Listing 1 shows a basic RSS XML file [3]. At the top level, a RSS document is a element, with a mandatory attribute called version, that specifies the version of RSS that the document conforms to. Below the element, there is a single element, which contains information about the channel (metadata) and its contents [2]. A may also contain any number of structures. The item may represent a “story” and contains elements like title, description and link [2]. Further explanation of Listing 1 can be seen in Sect. 4.3. Liftoff News http://liftoff.msfc.nasa.gov/ Liftoff to Space Exploration. en−usTue, 10 Jun 2003 04:00:00 GMT http://blogs.law.harvard.edu/tech/rssWeblog Editor 2.0[email protected][email protected]
123
K. Barlas et al. Star City http://liftoff.msfc.nasa.gov/news/ 2003/news−starcity.asp How do Americans get ready to work with Russians aboard the International Space Station? They take a crash course in culture, language and protocol at Russia’s Star City.Tue, 03 Jun 2003 09:39:21 GMT http://liftoff.msfc.nasa.gov/2003/06/03.html #item573 Listing 1 A sample RSS XML file
4.2 Formal specification using the OBJ tradition The formal specification (as explained in 2.2) of the RSS standard has been done using a member of the OBJ algebraic programming and specification language family. The OBJ languages are broad spectrum algebraic programming and specification languages, based on order sorted equational logic, possibly enriched with other logics (such as rewriting logic, hidden equational logic, or first order logic), and providing the powerful module system of parameterized programming. “OBJ” is the name referring to the language family, while “OBJ1”, “OBJ2”, “OBJ3”, “CafeOBJ”, etc., refer to particular members of the family [18]. OBJ has been used for many applications [20], including debugging algebraic specifications, rapid prototyping, defining programming languages in a way that directly yields an interpreter, specifying software systems (GKS graphics kernel system, Ada configuration manager, MacIntosh QuickDraw, etc.), hardware specification, simulation, specification and verification of imperative programs, user interface designs and theorem proving. OBJ3 has been used for building FOOPS (an object-oriented specification and programming system), the Eqlog system for equational logic (or relational) programming, OOZE (an object-oriented specification language influenced by Z), etc. CafeOBJ is a recent development in the OBJ community and is the specification language used in this test case. The main underlying logics of CafeOBJ are order-sorted algebras [13,19] and hidden algebras [13,14]. The former are used to specify abstract data types while the latter specify abstract state machines, providing support for object-oriented specifications. Listing 2 displays a sample CafeOBJ module that declares the factorial operation. Keyword mod! starts the module’s
123
declaration using tight semantics. The curly brackets start and close the module’s declaration. Protecting(INT) (or its short version pr(INT)) protects the built-in integer module, inheriting its sorts, operators and equations. The keyword op is used to declare non hidden operators. The operator fact takes an integer (arity) and creates an integer too (coarity). The var keyword declares a variable (X) of Int sort. The next two lines starting with ceq contain the conditional equations of the fact operator, the recursive equation for the factorial and the base case. We can then open the module we just declared with the select command and ask CafeOBJ to give us the factorial of 5 with the reduce command. The CafeOBJ system uses declared equations as left-to-right rewriting rules and reduces a given term. mod! FACT{ protecting(INT) op fact : Int −> Int var X : Int ceq fact(X) = X ∗ fact(X − 1) if X > 0 . ceq fact(X) = 1 if not (X > 0) . } select FACT . red fact(5) . Listing 2 A simple CafeOBJ module: factorial
4.3 XML structures An RSS file is an XML file. XML is a markup language that defines a set of rules for encoding documents in a format that is both human-readable and machine-readable. It is an open standard, defined in the XML 1.0 Specification produced by the W3C. Although the design of XML focuses on documents, it is widely used for the representation of arbitrary data structures, for example in web services. In fact, XML is the most common tool for data transmissions between all sorts of applications and it is now as important for the Web as HTML was to the foundation of the Web [44]. XML documents form a tree structure that starts at the “root” and branches to the “leaves”. Listing 1 (the sample RSS file) is almost a proper XML file, if we were to replace the first line (the “rss” element with ). This first line is now the XML declaration, defining the XML version (1.0). XML documents must contain a root element, so the next line describes the root element (the parent) of the document, channel. Channel contains ten child items (Title, Link, Description, Language, pubDate, etc.). The elements in an XML document form a document tree. The tree starts at the root and branches to the lowest level of the tree. The terms parent, child, and sibling are used to describe the relationships among elements. Parent elements have children. Children on the same level are called siblings [44]. Title, link, description,
Towards formal open standards: formalizing a standard’s requirements
language, etc., for example, are siblings. Item is a parent of five children (Title, Link, Description, PubDate and guid). An XML tag is a markup construct that begins with “<” and ends with “>”. There are three kinds of tags: start-tags (e.g., ), end-tags (e.g., < /channel>) and emptyelement tags (e.g., ). Tags are case sensitive. An XML element is everything from (including) the element’s start tag to (including) the element’s end tag. All XML elements must have a starting and a closing tag. An element can contain another element, text, attributes or a mix of all of the above. Attribute values must always be quoted; either single or double quotes can be used. An XML attribute can only have a single value and each attribute can appear at most once inside each element. Listing 3 XML declaration File
An XML document that follows the syntax rules mentioned is called “well formed”. To make sure that an XML document is well formed, we usually use some XML validator. That is different though from what we call a “valid” XML document, a document that is well formed and also conforms to a specified document type. A document type is a set of rules that define legal elements and attributes for XML documents; what may appear inside an XML document and its properties. There are different types of document definitions that can be used with XML: “Document Type Definition” (DTD) and the XML-based “XML Schema”. An XML document validated against either a DTD or an XML Schema is both well-formed and valid. 4.3.1 XML2OBJ XML2OBJ is a CafeOBJ project aimed to create a framework for describing XML structure while providing methods that can parse an entire XML tree structure from a file and can: – find a specific element (as it is identified by its tag) and— if necessary—by its parent element, if it exists – return such an element or return an element’s content – check whether an element has attributes and return both the names of the attributes and the assorted values. XML2OBJ is a CafeOBJ module that contains all the sorts and operators needed in order to achieve this kind of functionality and is a module that can be imported to any other CafeOBJ project that requires XML functionality. This module on its own is not a Document Type Definition (DTD) [43], as it only provides the methods that can be used to parse information from an XML File. The definition of the legal building blocks of the XML document is done later on (Sect. 4.4). To use an XML file in this module requires a few syntactical changes, mostly for readability issues.
Fig. 1 XML2OBJ element syntax and corresponding sorts
4.3.2 Description and syntax differences In XML2OBJ, an element follows the syntax displayed in Fig. 1. The whole block that begins with “<” and ends with “>” is of Element sort. The “Tag Name” which is the name of the XML element is of ElemName sort. The “Attributes” is of an AttribList sort and it can either be no Attrib if the element has no attributes or ((“Attribute1 N ame”@ = “Attribute1 V alue”) @(“Attribute2 N ame”@ = “Attribute2 V alue”)@...) in case there are one or more attributes. The no Attrib operator denotes an empty list of attributes. The @= operator assigns a value to an attribute name, while the @ operator joins parenthesized attributes. The “Element value” is of ContentList sort and -in the case of RSS—it can be either text, number, date, or another element. So, the ElementValue can be one of the following: t xt (“T ext V alue”)|nat (I nteger V alue)|dat (DateV alue) or another element if we are dealing with nested XML elements. That tells us that the Element sort is a subsort of Content, so we can declare the sort relation [Element < Content] which of course can be applied to the list versions of those sorts ([Element List < Content List]) too. There is no end tag that marks the end of an element as this is taken care of by the whole “< ... ... [...] >” syntax. Nesting of XML elements follows the syntax of Fig. 2. Elements Nested Tag_1 and Nested Tag_2 are children of “Parent” element. The @ operator concatenates elements. The conversion of an XML file to a XML2OBJ file is done automatically by a simple application. If we were to convert the XML file of Listing 1 (with the first line replacement of Listing 3) to a XML2OBJ file using this tool, we would get what is partially depicted in Listing 4. < ‘‘Channel’’ noAttrib [ (< ‘‘Title’’ noAttrib [ txt(‘‘Liftoff News’’)]>)@ (< ‘‘Link’’ noAttrib [ txt(‘‘http://liftoff.msfc.nasa.gov/’’) ]>)@
4.4 Algebraic specification of the RSS protocol Having imported the XML2OBJ module, we will now provide a DTD-equivalent for RSS in CafeOBJ. Doing so we gain advantages that regular DTDs do not support, like validation of the actual data (something that an XML Schema can take care of). For example, a DTD allows no constraints on character data. If character data is allowed then any character data is allowed. What an XML Schema (or a DTD) cannot do is more complex operations, like comparing an element’s value with another element’s value, or reusing a constraint for every element that uses the same type, without having to type that over and over again. Our approach can handle those things quite easily. 4.4.1 Specification approach RSS specification files are quite different from the kind of documents you would expect to find, for example in an ISO standard specification document. This is because, while every single parameter of RSS is explained, sometimes the explanation can be messy and sometimes unclear. The RSS v2.0 specification files provided in [2] may seem to be quite verbose (as all natural language specifications) but when compared to the size of a regular specification, that is still considered to be a small-sized specification. Each element of RSS v2.0 will be described in a separate module. We try to keep those modules as modularized as possible so that we can reuse them in other parts of the specification. Each such module is named after the RSS element that it describes and it contains the name of the according XML tag. It also contains part of the natural language specification [2] in the form of comment that complements the formal description. Finally, such a module contains the requirements for the element, if any.
123
We start the specification by declaring all the auxiliary modules that we need. The most important ones are: – A module that defines generic list structures and the operations that can happen to them. A list here can be a collection of items, appended with the @ operator. XML elements can play the role of items. – A module defining date structures and operations with dates, like extracting the day, month, year, etc., out of a date structure, or checking if a date is proper (month cannot be less than 1 or more than 12 and so on). – The XML module, as talked about in Sect. 4.3.1. These modules are declared before the channel’s specification, as CafeOBJ follows a bottom-up approach: the approach of piecing together systems to give rise to more complex systems, thus making the original systems sub-systems of the emergent system. Most of RSS’ elements can be classified into three categories, depending on the type of data they describe; strings, numbers and dates. Since most elements of a category are similar in nature, instead of writing a dedicated module for each and every one of them, we can create a template module for each category and extend it(using CafeOBJ’s importing/renaming methods) to create each module we need, significantly reducing the specification’s size [15]. Such a module can be used as a building block, containing an operator that extracts the content of the element, an operator that extracts the attributes list (if there is one), and a constant that denotes the name of the element. Such a generic module (constructor, declared with the ∗ symbol) for elements that have text values can be seen in Listing 5. The sort ElementList describes lists of elements, the sort titled ElemName describes the identifier for each element, the tag in the original XML description and the sort AttribList describes lists attributes. The operator getxmlcontent retrieves the value of the element. mod∗ BUILDINGBLOCK { pr(XML) op get−xmlcontent : ElementList −> String . op get−attributes : ElementList −> String . op XML−TitlePrefix : −> ElemName . eq XML−TitlePrefix = ‘‘’’ . vars X X1 : ElemName . vars A A1 : AttribList . var S : String . vars EL EL1 : ElementList . ceq get−xmlcontent( < X A [ txt(S) ]>) = S if (X = XMLTitlePrefix) and (A = noAttrib ) . ceq get−xmlcontent( ( < X A [ txt(S) ]>) @ EL1) = S if (X = XMLTitlePrefix) and (A = noAttrib ) . ceq get−xmlcontent( ( < X A [ txt(S) ]>) @ EL) = get−xmlcontent(EL) if not((X = XMLTitlePrefix) and (A = noAttrib)) .
Towards formal open standards: formalizing a standard’s requirements ceq get−xmlcontent(< X A [EL]>) = get−xmlcontent(EL) if not((X == XMLTitlePrefix) and (A = noAttrib)) . ceq get−xmlcontent( < X A [ ( < X1 A1 [ EL ]>) ]>) = get−xmlcontent( < X1 A1 [ EL ]> ) if not((X = XMLTitlePrefix) and (A = noAttrib )) . } Listing 5 Building block for string elements
There are two more similar building blocks for elements containing numbers and date structures, protecting the according building block and renaming their operators accordingly. In order to define an element of string type (e.g., Title, Enclosure, Language, etc.), all we have to do is to protect the building block module and change the generic name of the get-xmlcontent operator to get-ElementName, the name of the XML-TitlePrefix to ElementName-XMLPrefix and then set its value. In case the element we want to specify has attributes, we just add one operator for each attribute and define its value. In Listing 6 we see how the enclosure module has three attributes; URL, Length and Type. mod! ENCLOSURE { −− Describes a media object that is attached to the item. −− It has three required attributes. url says where the enclosure is located, length says how big it is in bytes, and type says what its type is, a standard MIME type. The url must be an http url. −− example: pr(BUILDINGBLOCK ∗ {op get−xmlcontent −> get−enclosure, op XML−TitlePrefix −> Enclosure−XMLPrefix}) eq Enclosure−XMLPrefix = ‘‘Enclosure’’ . op Enclosure−URLXMLPrefix : −> ElemName . eq Enclosure−URLXMLPrefix = ‘‘url’’ . op Enclosure−LengthXMLPrefix : −> ElemName . eq Enclosure−LengthXMLPrefix = ‘‘length’’ . op Enclosure−TypeXMLPrefix : −> ElemName . eq Enclosure−TypeXMLPrefix = ‘‘type’’ . } Listing 6 Element with attributes: Enclosure
Some elements can benefit from some check operators that can make sure that the element described works as intended. For example, the Language element has a list of allowed values. So we only accept a language element if the value of the element matches one of the ones declared in the properlanguage? equation (Listing 7). mod! LANGUAGE { −− The language the channel is written in. This allows aggregators to group all Italian language sites, for example, on a single page. A list of allowable values for this element, as provided by Netscape, is here. You may also use values defined by the W3C. −− example: en−us pr(BUILDINGBLOCK ∗ {op get−xmlcontent −> get−language, op XML−TitlePrefix −> Language−XMLPrefix})
eq Language−XMLPrefix = ‘‘Language’’ . op properlanguage? : String −> Bool var L : String . eq properlanguage?(L) = if L = ‘‘en−us’’ or L = ‘‘el−gr’’ −− or L = ‘‘...’’ then true else false fi . } Listing 7 Element with validation: Language
Finally, we suggest keeping parts of the natural language specification of the standard as comments in the according modules, as can be seen in Listings 6 and 7. Such comments can help someone who wants to learn about what each module does without the need to refer to the documentation. 4.4.2 Channel The channel (Listing 8) is the most important block of the RSS specification. It imports the three mandatory modules (elements) (Title, Link and Description) and all the other optional elements (CHANNEL-OPTIONAL-ELEMENTS), a module that imports every single one of the channel’s optional modules and its purpose is to keep the code readable. After all, clarity of presentation is important. Due to the way each imported module imports others on its own, module channel essentially imports every module of the specification. Then we define a channel’s XML prefix and finally we have an operator called properchannel? which inspects a channel element and checks it for validity. The Properchannel? operator states the Channel’s requirements, as deduced from [2] 1. that the root element of the file is “Channel” and that is declared without any attributes. 2. that there is a Title element declared without any attributes, that is a child of the Channel element; Title can also be nested under, e.g., Item, so it is important that we declare this condition too. The same things apply for Link and Description elements. 3. Moving on to the optional elements: Properchannel? operator checks that if there is a Language element, it has got to be declared without any attributes and that the element value is within the list of accepted language codes (e.g.. “en-us”). This element states the language the channel is written in, allowing aggregators to group all Italian language sites, for example, on a single page [2]. 4. If there is a WebMaster (email address for person responsible for technical issues relating to channel) element, it has got to be declared without any attributes. The same thing applies also for the ManagingEditor (the email
123
K. Barlas et al.
mod CHANNEL { pr(TITLE + LINK + DESCRIPTION + CHANNEL−OPTIONAL−ELEMENTS) op Channel−XMLPrefix : −> ElemName . eq Channel−XMLPrefix = ‘‘Channel’’ . op properchannel? : ElementList −> Bool . vars X X1 : ElemName . vars A A1 : AttribList . var S : String . vars EL EL1 : ElementList . vars PUBDATE TODAY : Date . var TTL1 : Nat . eq properchannel?(< X A [ EL ]>) = if ( (X = ChannelXMLPrefix) and (A = noAttrib) ) and ( ElemNameexists?(Title−XMLPrefix, EL) and (getxmlatt(Title−XMLPrefix, EL) = noAttrib) and (getparent(Title−XMLPrefix, < X A [ EL ]>) = Channel−XMLPrefix) ) and ( ElemNameexists?(Link−XMLPrefix, EL) and (getparent(Link−XMLPrefix, < X A [ EL ]>) = Channel−XMLPrefix) and (getxmlatt(Link−XMLPrefix, EL) = noAttrib) ) and ( ElemNameexists?(Description−XMLPrefix, EL) and (getparent(Description−XMLPrefix, < X A [ EL ]>) = Channel−XMLPrefix) and (getxmlatt(Description−XMLPrefix, EL) = noAttrib) ) and (ElemNameexists?(Language−XMLPrefix, EL) implies ((getxmlatt(Language−XMLPrefix, EL) = noAttrib) and properlanguage?(get−language(EL))) ) and (ElemNameexists?(WebMaster−XMLPrefix, EL) implies getxmlatt(WebMaster−XMLPrefix, EL) = noAttrib) and (ElemNameexists?(Managing−EditorXMLPrefix, EL) implies getxmlatt(ManagingEditor−XMLPrefix, EL) = noAttrib ) and (ElemNameexists?(Copyright−XMLPrefix, EL) implies getxmlatt(Copyright−XMLPrefix, EL) = noAttrib) and (ElemNameexists?(Docs−XMLPrefix, EL) implies getxmlatt(Docs−XMLPrefix, EL) = noAttrib ) and (ElemNameexists?(Rating−XMLPrefix, EL) implies getxmlatt(Rating−XMLPrefix, EL) = noAttrib ) and (ElemNameexists?(Generator−XMLPrefix, EL) implies getxmlatt(Generator−XMLPrefix, EL) = noAttrib) and ((ElemNameexists?(TTL−XMLPrefix, EL)) implies (getxmlatt(TTL−XMLPrefix, EL) = noAttrib)) and ((ElemNameexists?(PubDate−XMLPrefix, EL) and (getparent(PubDate−XMLPrefix, < X A [ EL ]>) = Channel−XMLPrefix)) implies (getxmlatt(PubDate−XMLPrefix, EL) = noAttrib and properdate?( get−pubdate(EL))) ) and (ElemNameexists?(SkipHours−XMLPrefix, EL) implies ((getxmlatt(SkipHours−XMLPrefix, EL) == noAttrib) and validhour(returnxmlnode(SkipHours−XMLPrefix, EL), hour(today)))) and (ElemNameexists?(SkipDays−XMLPrefix, EL) implies ( (getxmlatt(SkipDays−XMLPrefix, EL) == noAttrib) and validday(returnxmlnode(SkipDays−XMLPrefix, EL), dayT(today)))) and (ElemNameexists?(Cloud−XMLPrefix, EL) implies propercloud?(returnxmlnode(Cloud−XMLPrefix, EL)) ) and (ElemNameexists?(TextInput−XMLPrefix, EL) implies propertextinput?(returnxmlnode( TextInput−XMLPrefix, EL))) and (ElemNameexists?(Item−XMLPrefix, EL) implies properitem?(returnxmlnode(Item−XMLPrefix, EL))) and (ElemNameexists?(Image−XMLPrefix, EL) implies properimage?(returnxmlnode(Image−XMLPrefix, EL))) and (ElemNameexists?(LastBuildDate−XMLPrefix, EL) implies ( (getxmlatt(LastBuildDate−XMLPrefix, EL) == noAttrib) and properdate?(get−lastbuilddate(EL))) ) then true else false fi . } Listing 8 Channel module
address of the person responsible for editorial content), Copyright (the copyright notice for content in the channel), Docs (a URL that points to the documentation for the format used in the RSS file, probably a pointer to [2]), Rating (the PICS rating for the channel), Generator (a string indicating the program used to generate the channel), TTL (“time to live”; a number of minutes
123
that indicates how long a channel can be cached before refreshing from the source) elements. op validfeed? : Date Nat Date −> Bool . eq validfeed?(PUBDATE, TTL1, TODAY) = if (timestamp(year(PUBDATE), month(PUBDATE), day(PUBDATE), hour(PUBDATE), minutes( PUBDATE),seconds(PUBDATE)) + (TTL1
Towards formal open standards: formalizing a standard’s requirements ∗ 60) >= timestamp(year(TODAY), month(TODAY), day(TODAY), hour(TODAY), minutes(TODAY), seconds(TODAY))) then true else false fi . Listing 9 validfeed? operator
5. TTL is a gray area of the specification as some sources ([45]) mention that TTL indicates how long can a feed stay alive after its publication, which is different than what [2] mentions. If that is the case, then we could add a validfeed? operator that takes in a publishing date (PUBDATE), a Time-to-live (TTL1) and today’s date (TODAY ) and checks whether the place in time that we get if we add TTL1 to the publishing date is sooner or later than the current date and time (Listing 9). If this is an actual requirement out of the RSS protocol then we can add a clause to Listing 8 that checks that if a TTL element is present, its content has to validate validfeed? operator. 6. If there is a PubDate (the publication date for the content in the channel) element that is a child of Channel (since it can also be a child of item), it has got to be declared without attributes and the actual date is “proper”; that means that the months range within 1–12, the dates within 1– 31, the hour within 00-23 and so on. The properdate? operator is declared inside the auxiliary DATE module and returns true if the date fulfills all those conditions and false in other cases. LastBuildDate (the last time the content of the channel changed) is similar, minus the Channel parent requirement. 7. The Cloud element allows processes to register with a cloud in order to be notified of updates to the channel, implementing a lightweight publish-subscribe protocol for RSS feeds. A cloud has five required attributes: domain, which is the domain name or IP address of the cloud, port, which is the TCP port that the cloud is running on, path, which is the location of its responder, registerProcedure, which is the name of the procedure to call to request notification, and protocol, which is xml-
eq validhour((< X A [ ( < X1 A1 [ nat(N) ]>) ]>), CURRENTHOUR) = if ( (X == SkipHours−XMLPrefix) and (A == noAttrib) and (X1 == Hour−XMLPrefix) and (A1 == noAttrib) and ( (N >= 0) and (N <= 23) ) and not(N == CURRENTHOUR) ) then true else false fi . Listing 11 validhour operator
rpc, soap or http-post (case-sensitive), indicating which protocol is to be used [2]. A sample cloud element looks like this: The properchannel? operator makes sure that if a cloud element exists, it has to be proper. This is checked by the propercloud? operator (Listing 10) that checks that each one of those—required—attributes is declared. 8. SkipHours is an element that contains up to 24 sub-elements whose value is a number between 0 and 23, representing a time in GMT, when aggregators (if they support the feature) cannot read the channel on any of the hours listed in the skipHours element [2]. The properchannel? operator ensures that if the skipHours element exists, it is declared without any attributes and that each child of that node contains a valid hour (validhour? operator—Listing 11): an hourrepresenting number (between 0 and 23) that cannot be the same as the current hour, as checked against the current hour (hour(today)). Hour is an operator that returns the hour of the “today” date constant that is declared in the beginning of the specification. SkipDays is very similar in concept; it is an XML element that contains up to seven sub-elements whose value can be Monday, Tuesday, etc. Aggregators may not read the
eq propercloud?(< X:ElemName A:AttribList [ txt(S:String) ]>) = if (X == Cloud−XMLPrefix) and (xmlattexists?(CloudDomainXMLPrefix,getxmlatt(Cloud−XMLPrefix, < X A [ txt(S)]>))) and (xmlattexists?(CloudPortXMLPrefix,getxmlatt(Cloud−XMLPrefix, < X A [ txt(S)]>))) and (xmlattexists?(CloudPathXMLPrefix,getxmlatt(Cloud−XMLPrefix, < X A [ txt(S)]>))) and (xmlattexists?(CloudRegisterProcedureXMLPrefix,getxmlatt(Cloud−MLPrefix, < X A [ txt(S)]>))) and (xmlattexists?(CloudDomainXMLPrefix,getxmlatt(Cloud−XMLPrefix, < X A [ txt(S)]>))) then true else false fi . Listing 10 propercloud? operator
123
K. Barlas et al.
eq properimage?(< X A [ EL @ EL1 ]>) = if (X == Image−XMLPrefix) and (A = noAttrib) and (ElemNameexists?(Title−XMLPrefix, (EL @ EL1)) and (getxmlatt(Title−XMLPrefix, (EL @ EL1)) == noAttrib)) and (ElemNameexists?(Link−XMLPrefix, (EL @ EL1)) and (getxmlatt(Link−XMLPrefix, (EL @ EL1)) == noAttrib)) and (ElemNameexists?(URL−XMLPrefix, (EL @ EL1)) and (getxmlatt(URL−XMLPrefix, (EL @ EL1)) == noAttrib)) and (ElemNameexists?(Description−XMLPrefix, (EL @ EL1)) implies (getxmlatt(Description−XMLPrefix, (EL @ EL1)) == noAttrib)) and (ElemNameexists?(Width−XMLPrefix, (EL @ EL1)) implies properwidth?(getwidth((EL @ EL1)))) and (ElemNameexists?(Height−XMLPrefix, (EL @ EL1)) implies properheight?(getheight((EL @ EL1)))) then true else false fi . Listing 12 properimage? operator
channel during days listed in the skipDays element [2]. Properchannel? makes sure that there are no attributes and that each of the children of the node (denoting days) is different from the current day. 9. The TextInput element is somewhat of a mystery. You can use it to specify a search engine box, or to allow a reader to provide feedback. In any case, most aggregators ignore it [2]. However, a channel may optionally contain a textInput sub-element, which contains four required sub-elements: title (the label of the Submit button in the text input area), description (explains the text input area), name (the name of the text object in the text input area) and link (the URL of the CGI script that processes text input requests). Properchannel? makes sure that if there is a textinput, it has to be valid, as checked by the propertextinput? operator that is declared in the TEXTINPUT module. This operator checks that textinput is declared without attributes and that each possible children is also declared this way. 10. The Image element specifies a GIF, JPEG or PNG image that can be displayed with the channel. It contains three required and three optional sub elements. The required elements are: url (the URL of a GIF, JPEG or PNG image that represents the channel), title (describes the image, used in the ALT attribute of the HTML tag when the channel is rendered in HTML) and link (the URL of the site; when the channel is rendered, the image is a link to the site). [2] mentions that in practice the image’s title and link elements should have the same value as the channel’s title and link. While it is not entirely clear if this is a restriction or not, making that mandatory is as simple as having properchannel? retrieve the values for those 3 elements and compare them to the associated nodes, children of the channel element. Properchannel? makes sure that if there is an image element, it has to be valid; the properimage? operator handles that (Listing 12). What this operator does is that it makes sure that all three required elements are there and declared without any attributes, and if there are any of the optional elements, they are proper too.
123
eq properwidth?(W:Nat) = if ((W >= 0) and (W <= 144)) then true else false fi . Listing 13 properwidth? operator
In particular, the width and height elements, have some restrictions to keep in mind: width cannot be more than 144 pixels, height cannot be more than 400. Listing 13 shows properwidth? operator. Properheight? is defined in a similar way. 11. Last but not least, there is the Item element. An item represents a “story”, much like a story in a newspaper or magazine; its description is a synopsis of the story, and the link points to the full story. An item may also be complete in itself; if so, the description contains the text, and the link and title may be omitted. So it makes sense that the specification of RSS states that while all elements of an item are optional, a title or a description must be present [2]. A may contain any number of s. Properchannel? operator makes sure that if there is an element, it has to satisfy properitem? operator’s (Listing 14) requirements. Properitem? makes sure that a Title or a Description is present and that if there are any optional elements, they have got to be declared properly: (a) An enclosure describes a media object attached to the item. It has three required attributes. URL says where the enclosure is located, length says how big it is in bytes, and type says what its type is, a standard MIME type. The url must be an http url. The Properitem? operator checks that if enclosure is declared, all three attributes are present. (b) The link, author, category and comments elements have to be declared without attributes. The PubDate element is similar to the optional child of “channel”. (c) The source element is the name of the RSS channel that the item came from, derived from its . It has one
Towards formal open standards: formalizing a standard’s requirements eq properitem?(< X A [ EL @ EL1 ]>) = if (X == Item−XMLPrefix) and (A == noAttrib) and ((ElemNameexists?(Title−XMLPrefix, (EL @ EL1)) and getxmlatt(Title−XMLPrefix, (EL @ EL1)) == noAttrib) or (ElemNameexists?(Description−XMLPrefix, (EL @ EL1)) and getxmlatt(Description−XMLPrefix, (EL @ EL1)) == noAttrib)) and (ElemNameexists?(Enclosure−XMLPrefix, (EL @ EL1)) implies ( xmlattexists?(EnclosureURLXMLPrefix, getxmlatt(Enclosure−XMLPrefix, returnxmlnode( Enclosure−XMLPrefix, (EL @ EL1) )) ) and xmlattexists?(EnclosureLengthXMLPrefix, getxmlatt(Enclosure−XMLPrefix, returnxmlnode( Enclosure−XMLPrefix, (EL @ EL1) )) ) and xmlattexists?(EnclosureTypeXMLPrefix, getxmlatt(Enclosure−XMLPrefix, returnxmlnode( Enclosure−XMLPrefix, (EL @ EL1) )) ) ) ) and (ElemNameexists?(Source−XMLPrefix, (EL @ EL1)) implies xmlattexists?(SourceURLXMLPrefix, getxmlatt(Source−XMLPrefix, returnxmlnode(Source−XMLPrefix,(EL @ EL1))))) and (ElemNameexists?(Link−XMLPrefix, (EL @ EL1)) implies getxmlatt(Link−XMLPrefix, (EL @ EL1)) == noAttrib) and (ElemNameexists?(Author−XMLPrefix, (EL @ EL1)) implies getxmlatt(Author−XMLPrefix, (EL @ EL1)) == noAttrib) and (ElemNameexists?(Category−XMLPrefix, (EL @ EL1)) implies getxmlatt(Category−XMLPrefix, (EL @ EL1)) == noAttrib) and (ElemNameexists?(Comments−XMLPrefix, (EL @ EL1)) implies getxmlatt( Comments−XMLPrefix, (EL @ EL1)) == noAttrib) and (ElemNameexists?(PubDate−XMLPrefix, (EL @ EL1)) implies ((getxmlatt(PubDate−XMLPrefix, (EL @ EL1)) == noAttrib) and properdate?(getpubdate(EL @ EL1)) )) then true else false fi . Listing 14 properitem? operator
required attribute, url, which links to the XMLization of the source. Example: . Its purpose is to propagate credit for links and to publicize the sources of news items. It can be used in the aggregator’s Post command. It should be generated automatically when forwarding an item from an aggregator to a weblog authoring tool [2]. The properitem? operator checks true if there is a URL attribute. In Listing 8, operator getxmlatt takes an XML element and retrieves its attributes, operator getparent returns the parent node of a given element and operator returnxmlnode returns a node stripped of its parent and/or siblings. The rest of the RSS elements/modules have been omitted from this paper since they have nothing of particular interest to show. This concludes the specification of the RSS v2.0 protocol. The whole specification for RSS (including the imported modules such as XML2OBJ) does not exceed 820 lines in 44 modules, with lots of empty lines to improve clarity and with the comments for each element, as they appear on [2]. If we were to remove the comments, the specification would be around 650 lines of code. 4.4.3 Results We can now test a sample RSS channel (called samplechannel) against the specification that we created, by trying the
properchannel? operator on it. The sample channel is following the syntax of XML2OBJ. Since the whole RSS file is a channel element, if CafeOBJ’s engine reports back that the channel is indeed proper then we know it is valid. Such a sample RSS file and the process of testing it against the specification can be seen in Listing 15. Listing 16 shows the results. After 3909 rewrites of terms, CafeOBJ concludes that the samplechannel holds the properchannel? property, thus we can conclude that it is valid. If we were to change anything in the samplechannel so it does not live up to the specification, CafeOBJ will conclude that the channel is not proper. For example, if we replace the (required) attributes of Cloud with noAttrib (which suggests that Cloud is declared without any attributes), then Listing 17 shows how CafeOBJ outputs that this given channel is not proper (we can enable a more detailed output if we want to see which condition failed to evaluate properly). Any other deviation from the specification guidelines will have a similar result.
4.4.4 How do our claims hold up? We have created a formal executable specification for the RSS v2.0 standard in CafeOBJ that formally describes the requirements of RSS, as explained in [2] and understood by us, that can act as an RSS/XML DTD, as we can check a sample RSS file (converted via XML2OBJ) against it. In retrospect, here are how the original problems of natural lan-
−− opening module CHANNEL.. done. −− reduce in (properchannel?(samplechannel)):Bool (false):Bool (0.000 sec for parse, 3854 rewrites(4.140 sec), 30074 matches)
Listing 16 The result of Listing’s 15 reduction
Listing 17 What happens if the sample file is not valid
guage specifications (as explained in Sect. 1) were improved by creating a formal specification: – Verbosity: Comparing it to the 820 lines of code, the original specification is rather bigger in size; there are elements described in [2] that link to external pages that provide the proper specification of them. Just the Date and Time Specifications, as they appear in RFC 822, are 40 pages big. So, not only do we get a significantly smaller
123
sized specification, but it is also one that can be used as a document type. – Clarity: It is safe to assume that a standard that has been released to the public is (almost entirely) free of errors, omissions or ambiguities as it has been exhaustively tested. This means that we did not expect to find any critical ambiguity issues in this test case; however, studying the RSS v2.0 specification from [2] we could identify 11 groups of requirements, as they appear in Sect.
Towards formal open standards: formalizing a standard’s requirements
4.4.2, and faced a few problems with some elements, e.g., TTL or the title and link sub-elements of the image element. We could only specify what we thought that the developers originally intended (not allowing us to check its correctness) and this may be or may not be correct with regards to the original idea, but we see how writing a formal specification makes us scrutinize the standard. Asking the right questions while writing a specification for a new standard allows us to develop a very clear, “under the microscope”, view of the developers’ intentions and avoid misconceptions as to how the standard is supposed to work. At the same time, if we were to read such a formal specification of the RSS v2.0 protocol, the requirements would be quite clear to deduce and understand in one way only—the correct one, and we could verify that an implementation of the standard is working as it should. – Requirements amalgamation: The properchannel? operator displays each distinct requirement of the standard. This is probably the only area of the formal specification that is more verbose than the natural language version, as each element’s requirements is separate from the other elements whereas in the specification many elements’ requirements are grouped together. This way we can easily isolate any requirements we want in that operator and see how a change in the design of the RSS standard affects the big picture. Writing a formal specification of a standard is not as demanding as a formal specification of a specific system. We do not have to write a complete set of axioms that would be used in an algebraic term-rewriting system and we do allow incompleteness of the specification in order to allow a variety of systems with different functionality to comply to the standard. In the case of RSS, our specification not only provides the guidelines and the requirements of the standard, but it can also check the validity of a service that wants to communicate with the RSS protocol through the DTD functionality.
5 Conclusions In this paper, we presented the idea of applying formal specification techniques to describe the requirements of open standards. We formally specified RSS v2.0 in an attempt to show how this idea can help out to reduce the usual problems that come with natural specification documents. In brief, a formalized standard can (1) enhance communication and understanding among various stakeholder groups (2) be a management tool for various management teams and (3) standardize production and production lines. Among the reasons for not attempting formalization of open standards and other standards in general might be the
feasibility of the approach. One might ask if formal methods are applicable to all types of standards. RSS looks like a suitable case, but other cases might require a different specification approach and formal method which might be more or less known to different groups of people. We chose CafeOBJ, an in-house specification tool that we are very familiar with in modeling and specifying concepts like those of open standardization. Besides, RSS is a standard that emphasizes data structure so OBJ family is a very suitable family of languages to formalize that. A more wide-spread specification language like Z [17], VDM (Vienna Development Method), Estelle, Lotos, Petri Nets, SDL, TLA, or Raise, would be equally suitable. Unfortunately many software engineers are not educated enough in the use of formal methods, or they apply them very rarely. Standards’ readers could be alienated if not sufficiently prepared when reading a formally specified standard, but the process of getting acquainted with one is not that difficult. Acknowledgments The authors of this paper would also like to extend their thanks to Iulia Adomnita, Thrushna Nalam, Golnaz S. Nejad and Jari Veijalainen for the useful and fruitful collaboration.
References 1. Gödel’s incompleteness theorems. http://en.wikipedia.org/wiki/ G%F6del_incompleteness_theorem. Accessed 30 June 2013 2. Rss 2.0 specification. http://www.rssboard.org/rss-specification (2009). Accessed 30 Jan 2014 3. Sample file for rss v2.0. http://www.rssboard.org/files/sample-rss2.xml (2009) 4. Digital imaging and communications in medicine (dicom). http:// dicom.nema.org/ (2014). Accessed 11 Aug 2014 5. Berki E (2001) Establishing a scientific discipline for capturing the entropy of systems process models: Cdm-filters—a computational and dynamic metamodel as a flexible and integrated language for the testing, expression and re-engineering of systems. Ph.D. thesis, Faculty of Science, Computing & Engineering, University of North London 6. Berki E (2003) Formal metamodelling and agile method engineering in metacase and came tool environments. In: Tigka P, Kefalas K (eds) The 1st South-East European Workshop on formal methods. Agile formal methods: practical, rigorous methods for a changing world (Satellite of the 1st Balkan Conference in Informatics), pp 170–188. South-Eastern European Research Center (SEERC): Thessaloniki 7. Berki E (2006) Examining the quality of evaluation frameworks and metamodeling paradigms of information systems development methodologies, chap. 15, pp 265–289. Idea Group Publishing, Hershey, PA, USA 8. Berki E, Valtanen J (2007) Critical and creative mathematical thinking with practical problem solving skills—a new old challenge. In: Dranidis D, Sakellariou I (eds) Proceedings of the 3rd SouthEast European Workshop on formal methods (SEEFM07), pp 154–170. South-East European Research Centre (SEERC) 9. Berry DM, Kamsties E, Krieger MM (2003) From contract drafting to software specification: linguistic sources of ambigu-
123
K. Barlas et al.
10.
11.
12. 13.
14. 15.
16. 17. 18. 19.
20.
21.
22. 23.
24.
25.
26.
27.
28.
ity, a handbook. Online. http://se.uwaterloo.ca/~dberry/handbook/ ambiguityHandbook Blake G, Bly R (1993) The elements of technical writing. Elements of Series. Longman. http://books.google.fi/books?id= ewsoAQAAMAAJ Bowen JP, Breuer PT, Lano KC (1993) A compendium of formal techniques for software maintenance. BCS/IEE Softw Eng J 8:253– 262 Bowen JP, Hinchey MG (1994) Ten commandments of formal methods. IEEE Comput 28:56–63 Diaconescu R, Futatsugi K (1998) CafeOBJ Report: the language, proof techniques, and methodologies for object-oriented algebraic specification. AMAST Series in Computing, vol 6. World Scientific, Singapore Diaconescu R, Futatsugi K (2000) Behavioural coherence in objectoriented algebraic specification. J Univ Comput Sci 6(1):74–96 Diaconescu R, Futatsugi K, Iida S (1999) Component-based algebraic specification and verification in cafeobj. In: Proceedings of the Wold Congress on formal methods in the development of computing systems, vol II, FM ’99, pp 1644–1663. Springer-Verlag, London, UK. http://dl.acm.org/citation.cfm?id=647545.730763 Boyer RS, Moore JS (1981) The correctness problem in computer science. Academic Press, Orlando Diller A (1990) Z—an introduction to formal methods. Wiley, Chichester Goguen J (2014) The obj family. http://cseweb.ucsd.edu/~goguen/ sys/obj.html. Accessed 30 Oct 2014 Goguen JA, Meseguer J (1992) Order-sorted algebra i: equational deduction for multiple inheritance, overloading, exceptions and partial operations. Theor Comput Sci 105:217–273. doi:10.1016/ 0304-3975(92)90302-V Goguen JA, Winkler T, Meseguer J, Futatsugi K, Jouannaud JP (2000) Introducing obj. In: Goguen J, Malcom M (eds) Software engineering with OBJ: algebraic specification in action. Springer US, Boston, pp 3–167 Hierons RM, Bogdanov K, Bowen JP, Cleaveland R, Derrick J, Dick J, Gheorghe M, Harman M, Kapoor K, Krause P, Lüttgen G, Simons AJH, Vilkomir S, Woodward MR, Zedan H (2009) Using formal specifications to support testing. ACM Comput Surv 41(2):9:1–9:76. doi:10.1145/1459352.1459354 Hoe NS (2006) Free/open source software, open standards. Elsevier, New Delhi Ionescu C, Berki E, Nummenmaa J (2009) Applying weighted finite state machines to protocol performance analysis. In: Proceedings of the 2009 Fourth South-East European Workshop on formal methods, SEEFM ’09, pp 40–45. IEEE Computer Society. doi:10.1109/SEEFM.2009.16 ISO/IEC: Information Processing Systems (1990) Open systems interconnection—guidelines for the application of Estelle, Lotos and SDL. ISO/IEC TR 10167. International Organization for Standardization, Geneva, Switzerland ISO/IEC: Open Distributed Processing (1996) Basic reference model—part 4: architectural semantics. ISO/IEC 10746-4. International Organization for Standardization, Geneva, Switzerland ISO/IEC: Information Technology (1997) Framework: formal methods in conformance testing. ISO/IEC 13245-1. International Organization for Standardization, Geneva, Switzerland ISO/IEC: Information technology (2015) Open distributed processing—use of UML for ODP system specifications. ISO 19793:2015. International Organization for Standardization, Geneva, Switzerland Jackson M (1995) Software requirements & specifications: a lexicon of practice, principles and prejudices. ACM Press/AddisonWesley Publishing Co., New York
123
29. Karjalainen M (2010) Large-scale migration to an open source office suite: an innovation adoption study in Finland. Department of Computer Sciences, University of Tampere, Tampere 30. Lightfoot D (1991) Formal specification using Z. Macmillan Education UK, London 31. Malcolm G, Goguen JA (1996) An executable course in the algebraic semantics of imperative programs. In: Hinchey M, Nevill Dean C (eds) Teaching and learning formal methods. Academic Press, pp 161–179 32. Merruko M (2013) Utilising open source software development for effective electronic health records development. Master’s thesis, School of Information Sciences, University of Tampere. http:// tutkielmat.uta.fi/pdf/gradu06632 33. Merruko M, Berki E, Nykänen P (2012) Open source software process: a potential catalyst for major changes in electronic health record systems. In: Shaikh S, Stamelos I, Cerone A (eds) OpenCert 2012 + SEFM 2012 Proceedings (2013). http://opencert.iist.unu. edu/Papers/2012-paper-1-D 34. Muhonen M, Berki E (2011) An open process for quality assurance in systems. In: Dawson R, Ross M, Staples G (eds) The conference proceedings of software quality management XIX (SQM 2011). Loughborough University, Leicestershire, pp 231–241 35. Newcombe C, Rath T, Zhang F, Munteanu B, Brooker M, Deardeuff M (2015) How Amazon web services uses formal methods. Commun ACM 58(4):66–73. doi:10.1145/2699417 36. Raymundo O (2015) Hands-on with news in ios 9: Apple’s response to facebook and snapchat’s content platforms. http://www. macworld.com/article/2947012/software-news/hands-on-withnews-in-ios-9-apples-response-to-facebook-and-snapchatscontent-platforms.html 37. Shah R, Kesan J, Kennis A (2007) Lessons for open standard policies: a case study of the massachusetts experience. In: Proceedings of the 1st international conference on theory and practice of electronic governance, ICEGOV ’07, pp 141–150. ACM, New York, NY, USA. doi:10.1145/1328057.1328088 38. Sommerville I (2010) Software engineering, 9th edn. AddisonWesley, Harlow 39. Spivey J (1989) An introduction to z and formal specifications. Softw Eng J 4(1):40–50 40. Turner KJ (1993) Using formal description techniques: an introduction to ESTELLE. LOTOS and SDL, Wiley, New York 41. Valtanen J, Berki E, Barlas K, Li L, Merruko M (2013) Problemfocused education and feedback mechanisms for re-designing a course on open source and software quality. In: Uhomoibhi J, Barikzai S, Ross M, Staples G (eds) The 18th INSPIRE— INternational conference on Software Process Improvement in Research, Education and Training. Southampton Solent University Press, London, pp 23–36 42. Veijalainen J, Berki E, Lehmonen J, Moisanen P (2006) Implementing a new international paper mill efficiency standard—using computational correctness criteria to model and verify timed events. In: Eleftherakis G (ed) 2nd South-East European Workshop on formal methods (SEEFM 05). Formal methods: challenges in the business world, Ohrid, 18–19 Nov 2005, pp 27–43. South-East European Research Centre (SEERC) 43. W3Schools (2014) Dtd tutorial. http://www.w3schools.com/dtd/ 44. W3Schools (2014) Xml tutorial. http://www.w3schools.com/xml/ 45. Winer D (2006) The rss “ttl” element and p2p networks. http://scripting.com/2006/09/07.html#theRssTtlElementAndP2p Networks