Pub Res Q (2014) 30:1–10 DOI 10.1007/s12109-014-9346-7
Open Source Textbooks: A Paradigm Derived from Open Source Software Seth D. Bergman
Published online: 30 January 2014 Springer Science+Business Media New York 2014
Abstract This work exposes a new paradigm for the creation and publication of textbooks: open source. The phrase open source is borrowed from the computer software industry, where the word source has a technical meaning explained in this paper; open source software is software which has been developed by many collaborators using the internet to produce a final product. The contributors receive no financial compensation, yet there have been many successful open source software projects (Linux, Open Office, Apache, etc.). Open source textbooks use a similar financial model; the authors and contributors receive no direct financial compensation for their work. Contributors are listed in the produced work as primary author(s), co-authors, contributors, minor contributors, etc. according to the magnitude of their contribution. The produced work is available free for users on the internet. This paper will explain the open source process and will provide justification for open source as an effective paradigm; it will also present some existing open source textbook projects, as well as the author’s own open source textbook project. Keywords
Open source Textbook Publication Compiler
Beyond Textbooks The open source paradigm could easily be applied to books other than textbooks. Histories, essays, and other works of nonfiction could be produced with the open source model as well. Consider the author who spends the better part of his/her life collecting information for an extensive account of an historic event; he/she is probably motivated by factors other than pure profit. Such books could be produced much more quickly by cooperating authors using open source. Prior Work This paper is a more detailed version of a paper which was presented at the Eleventh International Conference on Books and Publishing. The earlier, less detailed version has been published in the International Journal of the Book [3]. S. D. Bergman (&) Computer Science Department, Rowan University, Robinson Hall 330C, 201 Mullica Hill Road, Glassboro, NJ 08028, USA e-mail:
[email protected]
123
2
Pub Res Q (2014) 30:1–10
Introduction In recent years there has been a surge of interest in open source software in the computer software industry. Open source software is software (computer programs) which has been developed in a collaborative way by many people, using the internet for this collaborative work. In most cases the developers of a single project never meet face-to-face; however, their collaboration often results in the production of complex, yet high quality, software. The developers receive no direct financial compensation for their work, but the resulting product is free and available to others on the internet. Some examples of well-known open source projects are Linux (an operating system), Open Office (a package of end user applications, compatible with MicroSoft Office), Mozilla Firefox (a web browser), Android (a platform for smartphone and tablet applications), and Apache Server (internet server software). This paper will expose a similar development process for textbooks (and possibly other kinds of books). This process is called open source textbooks because it uses the same development model used by open source software. Several open source textbooks have been produced, and several more are currently in production. This paper will explore the open source process, some of the motivating factors, and challenges with respect to open source textbooks. Most open source textbooks do not involve a publisher in the traditional sense because under the terms of the open source license they would not be permitted to distribute copies of the work for compensation. Some open source textbooks rely on a publisher to produce hard copy and/or a distributor to sell electronic or hard-copy versions for a nominal fee.1 Finally this paper will expose some of the open source textbook projects which are currently in progress, along with some recent trends in the publishing industry, of which open source is merely one manifestation.
Source: A Technical Definition The Merriam Webster Dictionary defines source as: A point of origin; a body of water at which a stream or river originates; one that causes, creates, or initiates; one that supplies information; a record, as a book or document supplying primary or firsthand information [11]. In Computer Science, however, the word source, is used in a technically specific way. Consequently, this word has entered common usage in an unusual way in recent years. People speak of outsourcing a business or segment of a business, as in ‘‘we plan to outsource our customer service division’’. This usage of source does not correspond to any of the traditional definitions found in the dictionary. To explain the new semantics for this word, a brief lesson from computer science is required. Computer hardware (specifically a CPU, or Central Process Unit) is capable of executing programs consisting of primitive instructions, coded in binary, or base two. These primitive instructions are limited to simple arithmetic operations (add, 1
See the discussion of Green Tea Press in the section on Current Open Source Textbook Projects and Trends below.
123
Pub Res Q (2014) 30:1–10
3
subtract, multiply, divide,…), data transfer instructions (moving data between the computer’s active memory and the CPU), I/O (Input and Output, moving data between the computer’s active memory and peripheral devices such as disks, flash memory sticks, monitors, keyboards, printers,…), and instructions capable of making simple true/false decisions. The language in which these programs are expressed using only these primitive, binary instructions, is called machine language. Needless to say, the task of programming a computer in machine language is exceedingly tedious and difficult. It is for this reason that high-level programming languages were developed in the 1950’s. These languages enabled people to write programs using algebraic notation and English words, as opposed to the binary codes of machine language. The only problem with high-level languages is that the CPU cannot execute them; they must be translated into machine language before they can be executed. This is the purpose of a program called a compiler [2], which is depicted below. The compiler accepts as input a program written in a particular high-level language and translates it into an equivalent program written in the machine language for a particular CPU. The people who wrote the first compilers used the word source to refer to the program used as input to the compiler, as in the source program, or the language2 of all such programs as the source language. The output produced by the compiler, the binary machine language program, is called the object program. For example, if a compiler is designed to translate any program written in FORTRAN to the Intel 8086 machine language, its source language is FORTRAN and its object language is Intel 8086 machine language. This usage of the word source is certainly consistent with the traditional dictionary definitions. Source “If (wage < 100) then tax is 0”
Object Compiler
01101 compare 10101 move
In the mid 1990’s several software companies decided to move their software development operations to foreign countries, such as India, where there was a good supply of software professionals; at that time some of these companies coined the phrase outsourcing to describe this decision. The phrase caught on, and now any business operation which is moved overseas is said to be outsourced.
Open Source Software When proprietary software is distributed, it is only the machine language version, or executable, which is sent to the users; in order to use the software, they do not need the original source language program (known as source code) from which the 2 The word language is being used in a technical sense here; formal linguists define a language to be a set of strings of alphabetic characters.
123
4
Pub Res Q (2014) 30:1–10
executable was derived. This is how the software developers maintain control of the product and prevent piracy of software; source code is not distributed, but rather source code is carefully protected from distribution by the developers. Such was the normal mode of operation in the software industry until the internet arrived. At this time independent software developers discovered that they could collaborate with others informally, using the internet for communication, to produce large viable software products. The source code which they produced was made publicly available on the internet, for all those who wished to make corrections or enhancements to it. Despite the fact that these developers made source code public, and despite the fact that they received no direct financial compensation for their efforts, open source projects soon became competitive with traditional proprietary software projects, such as web browsers and the so-called office applications. This new mode of operation was called open-source, and the proprietary software development companies were stunned when this happened. Large software projects require extensive managerial oversight; much of the project development effort is spent on requirements analysis and software design, before the first line of programming is written. During and after the programming process, extensive testing, validation, and customer acceptance is required. This is true no less for open source projects than it is for proprietary projects. This means that the originator(s) of the project are responsible for the assurance of quality of the final result. To assist with this they often rely on public world-wide-web-based source code repositories. These web sites, the largest of which is SourceForge.net, offer a variety of free services, enabling open source projects to manage the multitude of source files required for a large development project. It was not long before open source projects began to overtake some of the most widely used proprietary software packages. Some examples are: • • • • • •
Linux—An operating system for desktop computers. Open Office—A suite of personal computer applications compatible with MicroSoft Office. Mozilla Firefox—A web browser derived from Netscape. Apache Server—Internet server software, now the leading option in this area. Eclipse—An Interactive Development Environment for software developers. Oracle (formerly Sun) Java—A software developer’s compiler and toolkit.
Open Source Textbooks: A New Paradigm The Open Source Process An open source textbook will usually begin with one individual, the primary author (also referred to as the originator or owner of the project). The primary author is responsible for defining the subject and topics, table of contents, scope, intended audience, format, style, and required ancillary materials for the book. In addition, the primary author would create at least a few chapters to establish a framework for additional chapters. Thus, these first few chapters would elucidate:
123
Pub Res Q (2014) 30:1–10
• • • • •
5
The desired writing style of the narrative parts in the book. Examples of figures or diagrams to be included. The format for sample problems and/or exercises to be solved by the reader. The desired style for chapter headings, sections, sub-sections, page formats, etc. Any other formatting or stylistic aspects which should be consistent in additional chapters.
In order to create the initial chapters the primary author will need to select a software tool, such as a word processor, desktop publishing software, or document preparation software. The primary author may also wish to select software tools to be used in the development of graphics, figures, diagrams, charts, etc. All future contributors would be expected to use the same software tools in producing additional chapters and contributions. It is in the project’s best interest that all of these tools be open-source software tools. For example, instead of using Microsoft Word as the primary development tool, Open Office Writer should be used instead. This will encourage others who may not be able to afford proprietary software to join the project; also this would be more in spirit with the open source movement. In place of proprietary desktop publishing software, such as Adobe InDesign, many authors are using LaTeX [8] an open source publishing tool which produces highquality documents, with powerful cross-referencing, mathematical formulas, graphics, and indexing capabilities.3 Once this initial phase has been completed, the source files are made available on an internet web page. The source files are the original word processing, LaTeX source, or other originating document files (such as graphics documents for diagrams). If Adobe PDF files are generated from the source files, these PDF files would not be considered source files—they cannot be edited directly. At this point the project is open for additional contributions. Others are welcome to submit complete chapters, partial chapters, additional material for existing chapters, corrections to existing chapters, improved graphics for figures and diagrams, additional sample problems and/or exercises for the reader, etc. The primary author would be responsible for deciding which of these contributions are to be incorporated into the project. If accepted, the contribution becomes a permanent part of the project, and the contributor is acknowledged in the preface to the book. The primary author could establish various levels of contribution for this acknowledgement, for example: • • • •
Co-author: submission and acceptance of at least one complete chapter. Secondary author: submission and acceptance of at least two chapter sections. Contributor: submission and acceptance of at least one chapter section, three figures, diagrams, sample problems, etc. Minor contributor: submission and acceptance of any addition or correction to the project.
When the primary author has determined that sufficient content has been submitted and that the accepted submissions are of acceptable quality, a standard 3
LaTeX is derived from TeX, a document preparation system, primarily for scientific and technical publications, developed by Donald Knuth at Stanford University in the 1970’s.
123
6
Pub Res Q (2014) 30:1–10
document file for the book would be produced and made available on the internet. An example of a standard document file format is Adobe PDF. At this point potential users can freely download and make use of this textbook; they are free to print the work on paper and bind it, or simply use it in its electronic form on a device such as a computer, tablet, or electronic book reader. The users who use the book in its electronic version benefit from the fact that updates, corrections, and future editions do not require re-printing to paper. Economics and Legal Aspects of Open Source The open source movement is driven primarily by the high cost of proprietary work. In the case of software, desktop publishing packages can cost hundreds of US dollars (for a monthly fee of $50 US, one can use Adobe InDesign ‘‘in the cloud’’— a rental agreement). Similarly textbooks published and distributed in the traditional way by publishing companies are becoming prohibitively expensive. Some engineering and science textbooks sell for over $200 US at the campus bookstore. The textbook for one of this author’s courses [1] is selling for $130.30 US. This 546 page paperback book is an example of a fine work, currently in its fifth edition, and has been adopted at many universities around the world. It also includes significant programming projects for the student as ancillary materials. There is no question that the authors devoted significant time and effort in this project and deserve to be compensated; the publisher needs to cover development costs and printing costs, pay sales representatives and pay other employees; the campus bookstore (Barnes and Noble in this case) needs to pay overhead costs and employees, and distribute profits to stockholders. However, consider the typical student, perhaps working part-time and incurring substantial debt to afford the high cost of education. How can the student afford to pay this much for the textbook in a single course? Another interesting aspect of open source is the economic model from the developers’ perspective. Why would someone devote significant time to the development of a project for no financial compensation? To answer this we look at the open source software movement, where thousands of developers have done, and continue to do just that. These developers have benefited in several ways despite receiving no direct financial compensation: • • • •
As a member of a successful project, the developer can include this on a resume when applying for employment, promotion, or a raise in salary. The experience gained from working as a team member on a successful project is valuable. Corporations which adopt the developed software will hire people to install it, support it, and train others in its proper use. For highly successful projects, such as Linux, new companies devoted to the support of this software have evolved (most notably Red Hat). Team members would be prime candidates for jobs with these companies.
With respect to open source textbooks, the first bulleted item above is most relevant. Many university faculty are compensated with tenure, promotion, or salary
123
Pub Res Q (2014) 30:1–10
7
increase as a result of textbook publication. Though it is true that many textbooks have a large potential market of customers, and consequently produce substantial royalties for the author(s), many textbooks in so called ‘‘niche’’ areas have smaller markets. In this case the royalty checks may be relatively small, but the publication earns prestige for the university, and consequently the author(s) are rewarded by the university for this; many authors see this as a primary motivation for producing a textbook. This author’s textbook on Compiler Design [2] serves as a good example. Because the rather technical subject of compiler design is not required by most Computer Science departments, the annual royalty checks never exceeded $300, and the book is now out of print (the copyright reverted to the author). However, this author received a promotion in rank, largely due to the publication of a textbook. There are many other university faculty members who would be similarly motivated. What are the legal aspects of open source? If someone acquires an open source work at no cost, can they then distribute it for profit? This would depend on the licensing agreement which accompanies the work. Most open source software is distributed with what is called a Creative Commons license.4 This license can specify that the work can be downloaded free, providing the user agrees not to distribute it for profit. Open source in no way invalidates, undermines, nor contradicts, the validity of copyright laws. Open source is simply a different model for the production and distribution of intellectual property.
Challenges Facing the Open Source Paradigm A serious challenge faced by those developing open source textbooks is ensuring a consistent style for the work. This includes not only the page format, format of figures and diagrams, format of student exercises, but also the prose style. How can the primary author ensure that chapters submitted by others are written with a style that is somewhat consistent with, if not indistinguishable from, other chapters? As mentioned above, the primary author would be responsible for producing at least a few chapters. These initial chapters serve as a style guide for future contributors. The primary author would have a few choices when receiving additional submissions: • • • •
Accept the submission. Suggest changes to the submission to conform to the stylistic conventions already established. Accept the submission, and make the necessary stylistic changes. Reject the submission on the basis of inconsistent style.
Another challenge for open source development is the process generally known as the forking of a project. Since the source files for the project are publicly available, it is possible for another party to obtain all source files, and start a new project, perhaps 4
Creative Commons 2001. http://www.creativecommons.org.
123
8
Pub Res Q (2014) 30:1–10
making some changes and additions to the existing project. This happens most often when a contributor’s submission(s) have been rejected. The contributor may feel that the submission(s) are worthy and if accepted would significantly improve the project. The contributor would then incorporate the modifications into a new project (a copy of the original project, presumably with a different name). This is called forking (as in a ‘‘fork in the road’’). The contributor who does the forking is then the primary author of the new project. If the community of contributors and users feel that the forked project is significantly better, it will attract many more contributors and users; it will thrive (and the original project may continue, or it may wither). However, if the community of contributors and users feel otherwise, the forked project is likely to wither (in which case the original project is likely to thrive). This form of natural selection for open source projects is ultimately a good thing for the community of users because it ensures that the surviving project(s) are best for those users. The open source paradigm deals primarily with the process used to create a finished product. We have not addressed the issue of printing and/or distributing copies of the finished work for users. In the case of textbooks, the finished work is available for download on the internet. It will generally take the form of a standard (freely available) document format, such as PDF or PostScript. Users are then free to view the work on an electronic reader or to print hard copy. The finished work may have a limited copyright specified; for example, this author’s textbook is available free for all users, with the proviso that hard copies not be sold for profit (also, college professors adopting this book for use in a course are asked to notify the author). In many cases hard copy will be of limited value because a successful project will continue to evolve, making hard copies obsolete in time. Consequently the phrase ‘‘finished product’’, as used above, refers to a particular version, edition, or release of the product. Another challenge faced by the open source textbook movement is that of convincing qualified authors that they should contribute to a project for no compensation. This challenge is being met with more success in a field like computer science, where the contributors are familiar with the success of open source software, and can see other, indirect or less tangible, benefits resulting from contribution to a project as outlined in the section on Economics and Legal Aspects above. Perhaps the most serious challenge to the open source paradigm for textbooks is one of acceptance by the user community. People are often wary of something new and different. Something which is free is thought to be of low quality. Indeed, where is the assurance of quality? A book published by a reputable publisher has been reviewed by at least a few levels of editors and possibly external reviewers. Consequently, the publisher is known to produce only books of high quality. However, at the present time open source books may have no such reputation. This challenge has clearly been met by the open source software movement (in its early years, users were understandably skeptical) but only after several years of existence.5 It will be interesting to see if and how open source textbook projects also meet this challenge. 5
One of the early examples of a successful open source project is BSD Unix, a version of the Unix Operating System released by the University of California, Berkeley in 1977. The strong reputation of the Berkeley Computer Science Department contributed to world-wide acceptance of this product.
123
Pub Res Q (2014) 30:1–10
9
Current Open Source Textbook Projects and Trends This author is in the process of converting a previously published textbook [2] to open source.6 The book was first produced by the author using proprietary software tools: Aldus (Adobe) Pagemaker (now InDesign), Microsoft Excel, and Microsoft Powerpoint to create text, figures, and diagrams. Now out of print, a new version of the book is being produced with open source using LaTeX [8], DraTex [7], AlDraTex [7],7 and OpenOffice.8 Consequently those wishing to contribute to the development of this book need not purchase a proprietary software package. One of the first to adopt the notion of open source textbooks was Allen Downey, of Olin College of Engineering. He has formed his own open source site, Green Tea Press9 which includes the following books: • • •
Think Python [5] (originally published as Python for Software Design by Cambridge University Press). Think Stats [6]. Think Complexity [4].
Downey uses LaTeX to create the books, and relies primarily on O’Reilly10 to produce hard copy, and on Amazon to distribute them for a nominal fee. These books all have extensive contributor lists, ranging from minor corrections to large submissions and complete translations to other (natural and programming) languages. The California Open Source Textbook Project (COSTP)11 was established in 2001. It is a collaborative public/private undertaking, and was the first organization created to address the high cost, content range, and consistent shortages of K-12 textbooks in California. It has evolved into several other efforts, notably Open Textbook, Open Educational Resources, Open CourseWare, and Open Education. COSTP itself has expanded its goals to include free textbooks and resource repositories. On September 27, 2012 California Governor Jerry Brown signed Senate Bill 1052 [9] which promotes the production and use of open source textbooks at state universities and community colleges to help alleviate the high cost of textbooks for students. This bill provides funding to establish a repository of open source textbooks as well as support for the creation of such textbooks. Open source textbooks are part of a more general trend in the publishing industry toward free resources, particularly in the life and physical sciences. Commercial publishers of scientific research (such as Elsevier, Springer, and Nature Publishing 6
Rowan University. 1994. Last modified May 20, 2013. http://cs.rowan.edu/*bergmann
7
LaTeX is an extensible language, allowing for its own extension; DraTeX and AlDraTex are two examples of libraries of LaTeX extensions for the drawing of figures and diagrams.
8
Open Office is a complete suite of office applications including word processing, spreadsheet, presentation software, etc., available free from Apache at http://www.openoffice.org
9 10
Green Tea Press. 2004. Last modified April 24, 2013. http://www.greenteapress.com O’Reilly 1996. Last modified June 3, 2013. http://www.oreilly.com.
11
California Open Source Textbook Project. 2001. Last modified June 3, 2013. http://www. opensourcetext.org.
123
10
Pub Res Q (2014) 30:1–10
Group) are being faced with competitors which publish scientific papers for free download on the internet. ‘‘For scientific publishers, it seems, the party may soon be over. It has, they would have to admit, been a good bash’’.12 Although copyright and proprietary access to publications have been the dominant mode of legal regulation of publications, recent developments ‘‘are altering traditional relationships between makers, distributors, and consumers of information products in ways that could mark the end of publishing as we have known it’’ [12]. The notion of open source textbooks is not a complete antithesis to traditional methods; authorship has long been a collaborative enterprise in the areas of business, government, the sciences and social sciences [10]. However, the huge success of the internet provides the impetus and resources needed to propel this new paradigm forward.
References 1. Barnes DJ, Ko¨lling M. Objects first with java: a practical introduction using Bluej. Boston: Pearson; 2012. 2. Bergmann S. Compiler design: theory, tools, and examples. Dubuque: Wm. C. Brown Publishers; 1994. 3. Bergmann S. Open source textbooks: a new paradigm for the publication of textbooks. The International Journal of the Book. 2013;11(1):59–65. 4. Downey AB. Think complexity. Cambridge: O’Reilly; 2012. 5. Downey AB. Think Python: how to think like a computer scientist. Cambridge: O’Reilly; 2012. 6. Downey AB. Think stats. Cambridge: O’Reilly; 2011. 7. Gurari E. Tex & LaTeX: drawing and literate programming. New York: McGraw-Hill; 1994. 8. Lamport, L. LaTeX: a document preparation system. Reading: Addison-Wesley; 1994. 9. Trivedi, A. Governor signs bills affecting state’s higher education institutions. The daily Californian, Sept 28, 2012. Accessed 12 May 2013. http://www.dailycal.org. 10. Woodmansee M. On the author effect: recovering collectivity. In: Woodmansee M, Jaszi P, editors. The construction of authorship: textual appropriation in law and literature. Durham: Duke University Press; 1994. 11. Webster’s II, New Riverside University Dictionary. Boston: Houghton-Mifflin; 1988. 12. Woodmansee M, Jaszi P, editors. Introduction to the construction of authorship: textual appropriation in law and literature. Durham: Duke University Press; 1994.
12
‘‘Science & Technology: Free-For-All’’ in The Economist, May 4, 2013.
123