Requirements Eng (2009) 14:73–89 DOI 10.1007/s00766-009-0079-7
SPECIAL ISSUE-RE’07 BEST PAPERS
Towards automated requirements prioritization and triage Chuan Duan Æ Paula Laurent Æ Jane Cleland-Huang Æ Charles Kwiatkowski
Received: 20 July 2008 / Accepted: 29 January 2009 / Published online: 6 March 2009 Springer-Verlag London Limited 2009
Abstract Time-to-market deadlines and budgetary restrictions require stakeholders to carefully prioritize requirements and determine which ones to implement in a given product release. Unfortunately, existing prioritization techniques do not provide sufficient automation for large projects with hundreds of stakeholders and thousands of potentially conflicting requests and requirements. This paper therefore describes a new approach for automating a significant part of the prioritization process. The proposed method utilizes data-mining and machine learning techniques to prioritize requirements according to stakeholders’ interests, business goals, and cross-cutting concerns such as security or performance requirements. The effectiveness of the approach is illustrated and evaluated through two case studies based on the requirements of the Ice Breaker System, and also on a set of stakeholders’ raw feature requests mined from the discussion forum of an open source product named SugarCRM. Keywords Requirements prioritization Requirements triage Data mining Non-functional requirements
C. Duan P. Laurent J. Cleland-Huang (&) C. Kwiatkowski School of Computing, DePaul University, 243 S. Wabash, Chicago, IL 60604, USA e-mail:
[email protected] C. Duan e-mail:
[email protected] P. Laurent e-mail:
[email protected] C. Kwiatkowski e-mail:
[email protected]
1 Introduction In almost every project, budgetary restrictions and time to market constraints dictate the need for stakeholders to carefully prioritize and select a subset of requirements for development. As stakeholders often have very different objectives, they need to work collaboratively to identify conflicting requirements, negotiate solutions, and ultimately prioritize and triage requirements. The concept of triage, which is borrowed from the medical field, refers to the practice of quickly and systematically classifying victims of a disaster into one of three groups: those who have no hope of survival even with treatment, those who will recover even if they do not receive treatment, and finally those who can survive and live normal lives only if they receive treatment. When applied to the requirements problem, triage is defined as the process of determining which requirements a product or release should satisfy given the available personnel, time and other resources [1]. As a result of recent emphasis on the requirements process, many project managers have increased their efforts to elicit and analyze requirements leading to the generation of an increasingly large volume of stakeholders’ requests. For example, the $170 million FBI Virtual Case File project, which involved almost 6 months of joint application design (JAD) sessions, ultimately produced an 800 page requirements specification [2]. According to Patton, a security advisor working on the project, the requirements document became bloated with excessive details describing how the system was to be designed but failed to capture the essence of what it was supposed to do. Despite significant effort invested into the requirements process, the project was ultimately written off as a total failure attributed at least partially to problems in managing and organizing the requirements. Unfortunately these types
123
74
Requirements Eng (2009) 14:73–89
of problems abound in software development projects [3] and are only likely to increase as the scale and complexity of projects continues to rise. In a related problem, the increasing popularity of open source software, has led to the proliferation of open forums in which stakeholders discuss problems and issues, and request new features. Project administrators must sift through these feature requests to identify stakeholders’ requirements. However, managing, prioritizing and triaging these ‘‘raw’’ requirements can be challenging. For example, in a recent informal study of feature requests in ten popular open source projects, the requests were generally found to be poorly organized, and in several cases 70–90% of user defined discussion threads contained only two or three comments, resulting in a significant redundancy of ideas across threads. In other open source projects we found a tendency for one or two mega-threads to contain hundreds of loosely connected feature requests. This paper extends our previous work presented at the International Requirements Engineering Conference in 2007 [4] in which we proposed a semi-automated technique for generating a list of prioritized requirements from a large set of raw stakeholders’ requests, and then demonstrated how this prioritized list could be used to feed into and inform the triage process. The approach, which is named Pirogov after one of the inventors of early triage practices, uses clustering techniques to place requirements into multiple orthogonal categories that capture the diverse and complex roles played by individual requirements. For example, one clustering technique organizes requirements by feature sets; another identifies and categorizes nonfunctional requirements or early aspects; while others cluster requirements around user-defined themes such as business goals, high level use cases, or existing code modules. The end result is that every requirement is placed into one or more feature set, while a cross-cutting subset of requirements are placed into additional categories. Stakeholders determine the relative value of each cluster and weight the importance of each clustering method. An objective function then generates prioritization decisions at the level of the individual requirement. The complete process is depicted in Fig. 1.
Fig. 1 Constructing a prioritized list of requirements
Although other researchers have used clustering to support various aspects of the requirements process, to the best of our knowledge no-one has previously captured the complex roles played by individual requirements through automatically detecting and utilizing cross-cutting concerns to promote individual requirements during the prioritization process. This approach has several benefits. First, the human effort needed to prioritize requirements is significantly reduced. Although a more detailed explanation is offered later in this paper, a project with 1,000 requirements prioritized using a binary search tree (BST) would require approximately 10,000 comparisons. In contrast our approach, assuming reasonable parameters for clustering sizes, would require only about 500 comparisons. The second primary benefit is that it enables stakeholders to make higher level prioritization decisions, which are then automatically propagated down to the requirement level. As requirements prioritization can be fraught with tension, it makes sense to allow stakeholders to focus on fewer, more influential decisions. The third benefit is that multiple criteria can be considered during the triage process and stakeholders can even issue ‘what-if’ queries, such as ‘‘what if I want to bring forward architecturally significant requirements?’’ or ‘‘what if I want to focus primarily on features that contribute to clearly defined business goals?’’ Finally, an added benefit is that decisions made at the clustering level are reusable across different releases of the product, and can also be used to automatically filter incoming requirements and other stakeholder requests. Sections 2–8 of this paper describe our basic approach and illustrate its use for prioritizing a set of requirements [4] while section 9 extends the work to investigate its use within a very different domain of open source software. This extension requires the use of different and more robust data mining techniques in order to return effective results, but demonstrates the scalability and effectiveness of our approach. To summarize, Sect. 2 surveys and discusses current prioritization techniques. Section 3 describes the Ice breaker data set which is used for illustrative purposes throughout the paper. Section 4 surveys prior use of clustering in the requirements process and describes the basic
r r
r r
Cross-cutting NFR concerns
r r
r r
r r
Business goals
r r
r
r r
r r
Requirements
Additional clustering methods
r r
r
r r
r r
Incoming requirements are passed to various clustering modules to be clustered.
123
r r
r r r r
r r
r r
r r
r r
r r
Cluster priorities
Category weights
r r
r
Objective function
r r
A human analyst prioritizes the clusters.
r r
Relative importance of category types
1 2 3 4 5 6 7 8 9 10 11 12 .
..
r r
...
Unsupervised term-based clusters
A list of prioriA human analyst weights the importance tized requirements is generated. of category types.
Requirements Eng (2009) 14:73–89
clustering scheme used in Pirogov. Sections 5 and 6 introduce the idea of using cross-cutting concerns to promote critical requirements. Section 7 describes the synergistic use of the orthogonal clusterings to generate a global prioritization scheme, and Sect. 8 describes the triage process itself. In Sect. 9 we provide an extended example which applies Pirogov to an open source project. The example is built on feature requests mined from a real open source project named SugarCRM. Finally, Sect. 10 concludes with an analysis of the approach and a discussion of future work.
2 Requirements prioritization methods Many different prioritization techniques are used in practice. Often stakeholders simply place requirements into categories such as mandatory, desirable, or inessential [5], or else quantitatively rank the requirements [6]. For example, Beck [7] introduced the Planning Game, to help customers prioritize user stories in eXtreme Programming. Although such techniques are relatively simple to implement, they provide little support for higher level goalsetting and negotiation and furthermore do not scale well for managing requirements in large and complex projects. More sophisticated methods combine the preferences or decisions made by multiple stakeholders. For example Wieger [8] proposed an approach in which stakeholders assign each requirement a score from 1 to 9 based on the importance, cost, and technical risk of the requirement. The priority value for the requirement is then calculated as the importance/(cost ? risk). In the 100-Point Method [9] each participant is given 100 points to vote for their most important requirements. Unfortunately, almost all of these methods can be easily manipulated by stakeholders seeking to accomplish their own objectives. Several prioritization techniques help stakeholders to compare the relative value of individual requirements. For example, a BST can be used to prioritize relatively large sets of requirements [10]. It is constructed by inserting less important requirements to the left and more important ones to the right of the BST. A prioritized list of requirements can then be generated through an in-order traversal of the completed tree. This approach is simple to implement, scales relatively well, but provides only a simple ranking of requirements without assigning any priority values. In comparison, the analytic hierarchy process (AHP) [11] uses a ‘‘pair-wise’’ comparison matrix to compute the relative value and costs of individual requirements in respect to one another. It also includes a process to test the consistency of pair-wise decisions and is considered more accurate than other prioritization techniques; however, AHP does not scale well because the number of comparisons grows exponentially with the number of requirements.
75
Another class of prioritization technique supports the negotiation of agreements between stakeholders. For example Theory-W, also known as ‘‘Win–Win’’ [12], requires each stakeholder to categorize requirements according to importance and perceived risk. Stakeholders then work collaboratively to forge an agreement through identifying conflicts and negotiating a solution. The requirements prioritization framework [13], supports collaborative requirements elicitation and prioritization and includes stakeholder profiling as well as both quantitative and qualitative requirements ratings. Value-oriented prioritization (VOP) incorporates the concepts of perceived value, relative penalty, anticipated cost and technical risks to help select core requirements [14]. Although these techniques offer useful solutions, for managing requirements in small or medium sized projects, they do not provide a feasible solution for managing larger projects. In contrast, Pirogov provides sufficient automation to scale effectively for large projects.
3 An illustrative case study Throughout the next several sections of this paper, the prioritization and triage process is illustrated using the requirements for the Ice Breaker System (IBS), described in several previous papers [15, 16]. These requirements were initially introduced as a case study in ‘‘Mastering the Requirements Process’’ [17] and then enhanced with requirements mined from documents obtained from the public work departments of Charlotte, Colorado; Greeley, Colorado; and the Region of Peel, Ontario. The IBS manages de-icing services to prevent ice formation on roads. It receives inputs from a series of weather stations and road sensors within a specified district, and uses this information to forecast freezing conditions and schedule dispersion of salt and other de-icing materials. The system maintains maps of the district in order to plan de-icing routes and to ensure complete coverage of all roads. It also manages the inventory of de-icing materials; maintains, dispatches, and tracks trucks in real time; and issues and tracks work orders. The IBS consists of 202 requirements, of which 23 represent non-functional constraints.
4 Automated clustering Previously researchers have used document clustering to support a number of activities such as information retrieval performance improvement [18], document browsing [19], topics discovery [20], organization of search results [21], and concept decomposition [22]. For document clustering the hierarchical approach is often preferred because of its
123
76
natural fit with the hierarchy found in many documents. Clusters are constructed from the bottom up by progressively identifying the most similar elements and placing them into a shared cluster. Several researchers have investigated use of the agglomerative hierarchical clustering algorithm to cluster documents. Steinbach et al. [23] found that this approach outperformed other comparable techniques. Hsia et al. [24] and Yaung [25] decomposed requirements into clusters, in order to facilitate incremental delivery by constructing proximities based on the references requirements held to a set of system components. Al-Otaiby et al. [26] used a traditional hierarchical clustering algorithm to enhance design modularity by computing proximities as a function of concepts shared between pairs of requirements. Chen et al. [27] computed proximities by manually evaluating requirements to identify resource accesses such as reading or writing to files, and from this used iterative graph-based clustering to automatically construct a feature model. Goldin et al. [28] implemented an approach based on signal processing to discover abstractions from a large quantity of natural language requirement texts. Although creating explicit partitions was not the purpose of the work, the resulting set of abstractions could be considered a fuzzy clustering because the similarity between two requirements could be inferred from their contained abstractions. Our approach differs significantly from the research surveyed above in that we apply multiple orthogonal clustering algorithms to capture the complex and diverse roles played by individual requirements. This knowledge is then used to automatically generate a list of prioritized requirements.
Requirements Eng (2009) 14:73–89
" Prðaj jqÞ ¼
123
#, Prðaj jti ÞPrðq; ti Þ
PrðqÞ:
ð1Þ
i¼1
The formula contains three primary components. Pr (aj|ti) is computed as freqðaj ; ti Þ Prðaj jti Þ ¼ P k freqðaj ; tk Þ and estimates the extent to which the information in t describes the concept aj in respect to the total number of terms in the source element, while Pr(q, ti) is computed as Prðq; ti Þ ¼
freqðq; ti Þ ; ni
where ni represents the number of elements in the collection containing the term ti. Finally, Pr(q) is calculated as Pr(q) = RiPr(q, ti) and represents the extent to which the term ti describes the query concept q, by returning a value inversely proportional to the number of occurrences of ti across all requirements. A more complete description of this algorithm is described in several other papers [15, 16]. The clustering algorithm was implemented in this experiment as follows: 1.
2. 3. 4.
4.1 Feature identification In the IBS example, we utilized a modified version of the standard hierarchical clustering algorithm [29] in order to generate an initial clustering of feature sets. As with most automated clustering algorithms, a proximity score is first computed between each pair of elements, or in this case between each pair of requirements. These scores are computed using a probabilistic network model previously developed to support automated requirements traceability [15, 16, 30]. This model computes the probability (Pr) that two artifacts represent the same concept, based on the distribution and co-occurrence of terms. A preprocessing step is first executed in which common ‘stop’ words are removed, and remaining words are stemmed to their root forms. The similarity between two elements is then computed as a probability where q represents the source element, aj represents the target element, and terms {t1,, t2,…, tk} represent the set of stemmed words.
k X
5.
Initially each requirement was assigned to an individual cluster so that if there were N requirements, there were N clusters. The similarity between each pair of existing clusters was computed using formula (1). The pair of clusters that were most similar to each other were identified and merged together. The similarity between the new cluster and each of the remaining clusters was recomputed by calculating the average distance between requirements across both clusters. Steps 2–4 were repeated until a stopping condition is met.
Suitable stopping conditions can be based on average size of the cluster, or by evaluating a fit condition and stopping when average ‘‘fit’’ values started to decline. For these initial experiments clustering stopped when the average cluster size reached seven. This number was determined subjectively through observing the point at which clusters were internally cohesive and externally distinct from other clusters. (Note that the expanded case study reported in Sect. 9 of this paper describes a more sophisticated approach for determining granularity that we adopted in all subsequent experiments.) As a result of implementing the hierarchical algorithm, we observed a tendency for several requirements to remain individually un-clustered, even though they appeared to fit into one of the existing clusters. This occured because larger clusters tended to assert a stronger ‘gravitational’
Requirements Eng (2009) 14:73–89
pull than individual requirements in the particular clustering formula used in this initial experiment. The following clean-up step was implemented to address this problem: 6.
Clusters containing only a single requirement were identified and merged with the cluster that they exhibited greatest similarity to as measured by their proximity score.
77
Truck dispatch • • • •
9310: Truck dispatch. 9058: An alert shall be issued if a dispatched truck breaks down. 9142: A truck shall be categorized as dispatched or home. 9401: The dispatcher shall communicate with truck drivers through onboard computers.
It was also observed that a few clusters became unnecessarily large and were therefore re-clustered at a finer level of granularity:
Road closings
7.
• •
Clusters containing greater than twice the average cluster size were recursively re-clustered by following steps 1–6 to form more finely grained clusters.
The hierarchical clustering algorithm produces ‘crisp’ clusters, in which each requirement is placed into a single cluster. However, there are many requirements that naturally belong to two or more of the clusters. The fuzzy membership of each requirement in each of the existing clusters was therefore computed as follows:
•
•
9151: If road conditions make the road impassable the road shall be closed. 9193: Road closings. 9152: Critical contacts shall be notified when a road is closed. 9403: The clerk shall mark road closings on the map.
Fails to transmit • •
9054: An alert shall be issued if a Weather Station fails to transmit readings. 9164: If a Weather Station fails to transmit data an alert shall be issued. 9165: If a road sensor fails to transmit data an alert shall be issued.
For each individual requirement r, in respect to each of the identified clusters Cj, the similarity score Pr(Cj|r) was computed as defined in formula (1).
•
As a result of executing these steps, each requirement was associated with varying degrees of membership to each of the clusters. The term ‘primary membership’ was used to depict the cluster that each requirement most closely belonged to, while ‘secondary membership’ referred to the additional relationships identified in step 8. One of the weaknesses of term-based clustering approaches in the requirements domain is that requirements that have similar semantics but no shared terms would not be placed together in the same cluster. We have overcome this problem to some extent in our previous work on traceability, by supporting the use of project-level synonyms and acronym expansion [15, 16]. However, our previous studies have also shown that using a general thesaurus has an adverse effect on the quality of the results. Term based approaches are also not sufficiently sophisticated to differentiate between contradictory requirements which might contain the same terms. However, it seems ideal that these types of conflicting requirements should be clustered together, where they will have a greater chance of being noticed by a human user and then resolved through standard requirements analysis and negotiation techniques.
• •
4.2 Features in IBS
4.3 Cluster level prioritization
In the IBS case study, the clustering algorithm generated 41 distinct feature sets. The identified clusters included the following ones:
Identified clusters can be manually prioritized using any of the techniques described in Sect. 2. Because it is likely that conflicts will occur between stakeholders, a collaborative
8.
Update road data
• • •
9092: Road maps shall be updated regularly. 9087: The map shall be updated when changes are made to the physical roads. 9138: Update Road Data. 9404: Road maps shall be updated by importing data from an external source. 9170: Road maps shall be automatically validated against standardized digital maps.
As this technique does not use prior knowledge to anchor the formation of clusters, the clusters do not intrinsically have the representative names they need in order to be prioritized by a human analyst. However, finding representative names or abstractions for text is a non-trivial open research question [28] and so we therefore adopted a relatively simple approach of tagging parts of speech using Q-Tag, and then identifying verb and noun phrases. If a cluster included a title from the requirements hierarchy, that title was adopted if it contained a top occurring phrase, otherwise the phrase containing terms that occurred most frequently over all the requirements in the cluster was adopted.
123
78
technique such as Win–Win [12] is highly recommended. As requirements negotiation is outside the scope of this paper, we assume that stakeholders successfully negotiate and prioritize requirements at the cluster level. For the purposes of the reported experiment, 41 clusters were generated for the IBS data set and prioritized using a BST. This approach was chosen for our initial experiments because it is relatively straightforward and creates a precise ordering between requirements which was useful for evaluation purposes. During this manual phase of the process, stakeholders can also specify hard constraints based on identified dependencies between requirements representing either business or technical constraints [31]. If for example cluster A had a business dependency upon cluster B, then A could not deliver business value unless B was also deployed. From a prioritization perspective, if analysts assigned A a higher priority than B, then a dilemma would be introduced during automated requirements prioritization as to whether A should be demoted to B’s prioritization level, or whether B should be promoted to A’s level. In Pirogov, analysts are therefore asked to explicitly resolve these conflicts when setting precedence constraints. 4.4 Automated requirements prioritization The cluster-level prioritization decisions and associated constraints can then be used to prioritize individual requirements and create a baseline ranking. For purposes of this paper, the baseline was created from the set of prioritized feature sets; however, alternate baselines such as those centered around business goals [14] are equally feasible. A number of informal experiments were conducted to evaluate the relative benefits of using fuzzy versus crisp clusters in the prioritization of requirements. In fuzzy clusters each requirement belongs with varying degrees of membership to each of the identified clusters, while in crisp clusters each requirement belongs to one and only one primary cluster. Although the impossibility of creating a provably optimal prioritization scheme makes it difficult to formally evaluate the benefits of each approach, our informal observations have led to the conclusion that overuse of fuzzy clustering leads to somewhat chaotic prioritization results. On the other hand, allowing only crisp clusters ignores the fact that certain requirements exhibit clear relationships with more than one cluster. For example the requirement stating ‘‘If a weather station fails to transmit data an alert shall be issued’’ is clearly associated with its primary cluster of ‘‘Failure to transmit’’ as well as another important cluster describing ‘‘Weather Station’’ functionality.
123
Requirements Eng (2009) 14:73–89
In Pirogov, this dilemma is resolved by implementing a hybrid approach. First, the highest similarity score Max = maxi,r{Pr(Ci|r)} is discovered over all possible combinations of requirements r and clusters Ci. The primary cluster (PC) is then identified for each individual requirement and its similarity score Pr(PC|r) is set to Max. This means that primary membership for all requirements over all clusters will always take precedence over secondary memberships. Requirements are then prioritized in respect to the current clustering method CM by computing their prioritization score PS as follows: PSr ¼
jCj X
PrðCi jrÞRCi
ð2Þ
i¼1
where C represents a set of clusters {C1, C2,…, C|C|}; and RCi represents the ranking of cluster Ci within cluster method CM. As discussed at the beginning of this section RCi values are provided by project stakeholders as the result of a standard prioritization process Once PS values are computed, requirements can be ranked accordingly. Figure 2 depicts the spread in scores generated by Pirogov after the analyst prioritized the feature sets generated for IBS.
5 Promoting cross-cutting concerns As pointed out by Nuseibeh [32], the processes of requirements analysis and architectural design are highly dependent upon each other, and non-functional requirements drive critical architectural decisions. It is therefore important to identify and prioritize globally significant concerns, including requirements that impact the architectural or interface design of the system, or that cross-cut the design in other ways. In Pirogov, cross-cutting and other architecturally significant requirements are identified through use of the NFR classifier [33]. 5.1 The NFR classifier The NFR classifier is a data mining tool designed to detect and classify a broad variety of NFR types related to attributes such as security, performance, or usability. The classifier is trained using a training set of pre-categorized requirements, and is primarily based upon the concept of weighted indicator terms, which weight each potential term according to the extent to which it indicates the presence of a specific NFR type. In the NFR classifier [33], indicator term weights are computed as follows:
Requirements Eng (2009) 14:73–89
79
Fig. 2 A baseline prioritization scheme
5.2 NFRs in the IBS
N
PrQ ðtÞ ¼
Q 1 X freqðdQ;i ; tÞ NQ ðtÞ NPQ ðtÞ dQ;i NQ i¼1 NðtÞ NPQ
ð3Þ
where the first factor N
Q 1 X freqðdQ;i ; tÞ dQ;i NQ i¼1
represents the average term frequency of term t in NFRs of type Q, scaled over the size of the documents set dQ. The next factor NQ ðtÞ=NðtÞ computes the percentage of Q type documents in the training set that contain t with respect to all requirements in the training set containing t, whose number is denoted by N(t). This factor penalizes the weighting if t is used frequently across all Q types in the requirements specification. The third factor decreases the weight of the indicator term if term t is project specific. NPQ ðtÞ=NPQ represents the ratio between the number NPQ(t) of projects in the training set that include type Q over the number NPQ of projects in the training set containing type Q NFRs that do not contain the term t. Once indicator terms have been generated for a specific NFR type, the classifier can be used to detect that type of requirement in a new dataset. A probability score PrQ(R) is computed to evaluate the probability that a certain requirement R belongs to type Q as follows: P t2R\I PrQ ðtÞ PrQ ðRÞ ¼ P Q ð4Þ t2IQ PrQ ðtÞ where the numerator is computed as the sum of the term weights of all type Q indicator terms that are contained in R, and the denominator is the sum of the term weights for all type Q indicator terms. Extensive experiments were conducted using this approach [33] to evaluate the NFRclassifier’s ability to return correct results. For example an experiment based on 15 MS students projects returned an average recall of approximately 74% and precision of 16%, while a similar industrial study based on a dataset provided by Siemens Logistics and Automation returned recall of 80% and precision of almost 21%.
An initial observation of the IBS indicated that it included a significant number of security, usability, look-and-feel, reliability, and extensibility requirements. Although performance was also a potentially important concern, there were no performance requirements and so we were unable to consider them. The NFR-classifier had previously been trained to detect security, usability, and look-and-feel requirements [33], but not reliability and extensibility ones. The IBS data set was therefore used to retrain the classifier to detect and classify these two types of requirements. The two top ranked NFRs for each type are shown below: •
• • • • • • •
• •
When new road sensors are added, the thermal map shall be updated to reflect the new weather data (extensibility). Road maps shall be updated by importing data from an external source (extensibility). Only authorized users shall access the system (security). Authorized access shall support three levels of security (security). If a road sensor fails to transmit data an alert shall be issued (reliability). If a Weather Station fails to transmit data an alert shall be issued (reliability). All screens shall comply with standardized look-andfeel style sheets (look and feel). The system interface shall minimize text input from a user by using drop-down menus, radio buttons, etc (look and feel). The system shall provide access to multiple users (usability). Online help shall be provided (usability).
For purposes of the case study, the five NFR types were prioritized on a scale of 0–5 with 5 being most important, as reliability (5), usability (2), look-and-feel (2), security (1), and extensibility (1). Section 7 of this paper explains
123
80
Requirements Eng (2009) 14:73–89
how cross-cutting concerns such as these NFR types and other criteria are incorporated into the prioritization scheme to promote important requirements. The weighting of each NFR type is very dependent upon the overall goals and operating context of the system under development. For example, a kiosk type system in an airport might be expected to have high usability requirements while an inter-bank fund transfer system might have high security goals. If a system had very different NFR goals for its various components, then the prioritization process could be conducted iteratively so that the major components were first prioritized, and then NFRs within each component were prioritized as a second step. It is also unlikely the stakeholders would contribute a significant number of unnecessary NFRs, i.e., specify usability requirements for a system with a limited user interface, or security requirements for a non-critical internal system. Therefore a global approach in which NFRs are promoted across the entire system is unlikely to create too many false prioritizations.
6 Additional clustering criteria In addition to the two basic prioritization criteria, Pirogov allows stakeholders to define additional factors that they would like to use to promote or demote requirements in the prioritized ranking. To illustrate this, two factors of business goals [14] and use cases are described; however, other factors such as stakeholder priorities, relation to existing and deployed features, or risk factors could also be considered. The following prioritized business goals were established for the IBS system. 1. 2. 3. 4. 5. 6. 7.
To keep accurate maps of the road district. To receive accurate and timely weather information from the weather stations. To receive accurate and timely data from the road sensors. To ensure that adequate de-icing materials are available at all times. To maintain roads free from ice. To keep the public informed when roads are closed. To keep trucks maintained in good repair.
The probability of each requirement being associated with each of these business goals was computed using the automated traceability formula defined as formula (1) in Sect. 4.1. A simple membership criterion, that included only the top 20% of links scoring greater than zero probability, was established based on extensive studies we previously conducted into the recall and precision of the automated traceability tool [15, 16, 34]. As 361 potential
123
links were generated between requirements and business goals, the top 72 links were selected. A similar process was executed for critical use cases which defined high level user tasks. For example the use case to ‘‘Maintain accident information’’ was defined as follows: • • • •
Track accident occurrences. Produce accident reports as needed. Maintain accident information. Update accident logs.
Traces were then generated between requirements and use cases; use cases were prioritized; and priority scores were generated. 6.1 Checking for orthogonality For multiple classification techniques to be useful in the prioritization process, they need to produce orthogonal clusterings of requirements. Therefore each pair of the four clustering techniques was tested for orthogonality by generating coefficient vectors and computing the cosine of their angles. All possible pairs returned values within the range of 0.27–0.46 indicating weak correlation, which suggested that the clustering techniques produced sufficiently orthogonal results. 6.2 Prioritization of clusters Prioritization scores, as described in formula (2), were then computed for each requirement in respect to each of the four clustering criteria. A small subset of these requirements and their associated scores is reported in Table 1. Figure 3 depicts the distribution of membership values for each requirement in respect to the four clustering techniques, ordered according to feature set priority scores. As the feature set clustering uses unsupervised learning techniques, all requirements belonged to at least one of these clusters. These baseline priority values which are represented by the dashed line in Fig. 3, reflect the prioritization decisions made by the stakeholders in respect to each cluster, as well as the degree to which each requirement relates to the primary theme of that cluster. For example, the leftmost feature requests were all closely related to the topic of de-icing, which was ranked as the first priority by stakeholders. Requirement priority scores for NFRs, business goals, and user tasks are shown on the same graph shaded in varying degrees of black and gray. As these three clustering techniques represent supervised learning in which clusters are formed using a predefined description of the cluster characteristics, only a subset of requirements were categorized by each technique. As expected, NFRs were found
Requirements Eng (2009) 14:73–89
81
Table 1 A small subset of requirements showing priorities allocated by each of the four techniques Requirements
Feat. sets
NFRs
The scheduled de-icing shall be for a valid district
1.77
The system shall provide a list of contact services
1.58
The material storing capacity of each truck shall be documented
1.46
If a road sensor fails to transmit data an alert shall be issued
1.12
0.62
Authorized access shall support three levels of security
1.85
0.12
Bus. goals
Use cases
1.10
0.19 0.05
0.18
Fig. 3 The scope of impact of four clustering techniques across all requirements
to be relatively cross-cutting, although a closer examination showed that different types of NFRs were more prominent across different sets of feature requests. For example, usability requirements tended to constrain requirements for street and thermal maps, while performance requirements tended to describe mapping, scheduling, and communication constraints. Business goals also cut across the prioritized feature requests and were useful for identifying business critical requirements. For example, business goal # 1 to ‘‘To keep accurate maps of the road district’’ highlighted the feature request ‘‘keep road data accurately updated’’ depicted in Fig. 3 in the 94th position of the ranking. User tasks were not so evenly distributed across the baseline prioritized feature requests. Several user-related feature requests were grouped together on the very leftmost side of the graph, representing the task of managing de-icing, while a second group was found in the 132nd–145th positions representing the task of driving de-icing trucks. This second group is interesting because it depicts a set of feature requests that were overlooked by the baseline prioritization method, but which were clearly highlighted by the search for critical user related tasks. Once prioritization scores have been computed for individual clustering methods, stakeholders must determine
the relative value for each clustering method, answering questions such as ‘‘Are business goals more important than NFRs?’’ or ‘‘are there any project level quality goals related to attributes such as security or usability that we should be considering?’’. The global priority score (GPS), which takes all of the factors into account, can then be computed for each requirement by assigning relative weights W = {w1, w2,…, wn} to each clustering method such that Riwi = 1. GPS is computed by enhancing the PS formula as follows: 2 3 ðiÞ jCMj jC Xj X ðiÞ 4wi GPSr ¼ ð5Þ PrðCj jrÞRCij 5 i¼1
j¼1
where CM = (CM1, CM2,…, CMn) represents each of the clustering methods and all other notations are carried over from formula (2). As an example, consider the GPSs shown in Fig. 4 against a backdrop of the baseline priorities. In this scenario, the following weightings were used: Features sets (0.4), NFRs (0.2), business goals (0.3), and use cases (0.1). Weighting decisions in any given project must be made by considering the particular characteristics and market pressures of that project. In general, the resulting GPSs are
123
82
Requirements Eng (2009) 14:73–89
Fig. 4 Ranking requirements according to stakeholder determined weightings of four clustering factors
lower than the scores for prioritized feature sets because the baseline scores contribute only a fraction, in this case 4/ 10th, of the total global score. The primary observation from these results is that a number of individual requirements received significantly higher rankings once additional factors were incorporated into the prioritization scheme. These requirements are depicted in the graph as the elevated bars that appear taller than their neighbors. If the requirements in the graph were re-sorted according to these global priorities, then the requirements ordering would be changed accordingly. An analysis of promoted requirements showed that for this particular prioritization scheme, 36 of the 202 requirements were promoted in ranking by 5% or higher. The requirements that were promoted supported the core goals of the system, from the standpoint of the business goals and use cases—keeping the roads free from ice, maintaining inventory of the de-icing material, receiving and verifying weather information and receiving data from road sensors. Table 2 shows the percentage of promoted requirements that were placed in the top 25 and 50% of the rankings for each individual clustering method. These results confirmed that the prioritization algorithm successfully took multiple criteria into consideration during the global prioritization process. We also conducted a subjective analysis of the promoted requirements to check that correct requirements were being promoted. For example, requirement # 9026 stating that ‘‘A road sections list shall be maintained’’ was ranked only in position 198/202 by the feature set prioritizations. Table 2 Ranking of promoted requirements Clustering method
Weight
Top 25% (%)
NFR
0.2
38
72
Business goal
0.3
97
100
Use cases
0.1
50
83
123
Top 50% (%)
However, following global prioritization, this requirement was elevated by 40% to 116/202, primarily due to its association with the business goal to ‘‘keep the public informed when roads are closed’’. The relatively low priority of this particular business goal (5/6) prevented the requirement from being promoted higher up the list.
7 Triage The purpose of creating prioritized lists of requirements is to provide an informed input into the triage process. Triage itself is influenced by numerous project level factors such as resource availability of personnel, funds, and available calendar time. It is also influenced by the software development process which dictates whether a product will be delivered through a series of mini-releases or in a single larger release. The prioritized list, delivered by Pirogov, can be used in a number of different ways. For example, stakeholders could start at the top and make inclusion decisions for each requirement based on resource availability until no more resources are available, or they could divide the requirements into priority groupings and focus on selecting requirements from the top groups only. To assess the effectiveness of our approach, five equally sized categories of ‘must have’ ‘recommend having’, ‘nice to have’, ‘can live without’, and ‘defer’’ were created. The top 20% of prioritized requirements were placed into the first group, the next 20% into the second group, and so on. The results were then examined by two of the researchers to evaluate inclusion and exclusion errors in respect to previously determined prioritization goals. Inclusion errors incorrectly place non-important requirements into high priority categories, and were analyzed through inspecting the top two priority groups. About 58% of the requirements were found to be important ones, another 25% could be considered for early release if sufficient resources were available, while 17% were deemed to have been incorrectly prioritized. Fortunately, inclusion errors can be easily
Requirements Eng (2009) 14:73–89
resolved through analyst review. Exclusion errors occur when important requirements are placed into low priority groups and represent a more serious problem. These errors were evaluated through examining the two lowest priority groups, however, only one important requirement was found to have been incorrectly placed there. 8 Applying Pirogov in an open source forum The previous example demonstrated how automated prioritization techniques could be used to prioritize a relatively small set of requirements. In this section we explore the application of this technique in the more realistic context of a large open source forum. SugarCRM represents an open source customer relationship management system that supports campaign management, email marketing, lead management, marketing analysis, forecasting, quote management, case management and many other features. Stakeholders enter feature requests or report problems in user-defined discussion threads. For purposes of this paper we mined 1,000 feature requests distributed across 239 threads from SugarCRM’s open discussion forum, and then edited them to remove sections that had been duplicated from previous posts and to remove HTML and URLs [33]. The challenges of applying Pirogov in an open source forum are somewhat different from the IBS case study described in earlier sections of this paper. Whereas, the IBS case study dealt with the prioritization of clearly articulated, succinct, and non-redundant requirements, the feature requests in the SugarCRM discussion forum contain a significant amount of background noise such as off topic discussions, poor grammar, spelling errors, abbreviations, redundancies, and inconsistent use of terms. As such, the feature requests present a more difficult clustering challenge than the structured and carefully articulated requirements of the IBS data set. For this case study only three different factors were considered. These included the level of interest stakeholders exhibited in specific feature requests, a subset of features specifically targeted by SugarCRM project managers, and finally a set of cross-cutting quality concerns that stakeholders expressed interest in. 8.1 Feature mining As previously described, the first step in the prioritization process involves clustering needs into feature sets. Initially we considered using the threads explicitly created by stakeholders in the SugarCRM forum, in place of automatically generated clusters; however, we observed a significant number of problems associated with these
83
threads. For example, 55% of the threads were found to be singletons, meaning that they included only one feature request, and several of them represented similar themes that could easily have been merged. In total, approximately 39% of the individual feature requests belonged to threads of three requests or fewer. Furthermore, many of the threads included off topic discussions or evolved far beyond the original theme. Therefore, based on the poor quality of the user defined threads, we decided to rely on unsupervised clustering techniques to identify primary features. The feature requests were first clustered using the same agglomerative hierarchical algorithm utilized in the IBS case study; however, several of the resulting clusters lacked a focused and cohesive theme or contained a significant number of seemingly unrelated feature requests. As a result of these problems, a consensus clustering technique was adopted. Consensus clustering has a slower running time, but has been demonstrated in our recent work to be significantly more robust than a standalone hierarchical approach [35, 36]. Consensus clustering generates multiple candidate clusterings and then implements a voting technique to generate the final result. Our implementation utilized a clustering algorithm known as Spherical Kmeans (SPK), which is described in Fig. 5 and has been shown in our recent work to outperform hierarchical approaches [37]. The consensus clustering technique was then implemented as follows: (a)
Pre-processing. Stakeholders’ needs were initially parsed to stem words to their root forms, remove common (stop) terms, and compute the term frequency, inverse document frequency (tf–idf) values for all terms. Intuitively, tf–idf weights terms more highly if they occur less frequently and are therefore expected to be more useful in expressing unique concepts in the domain. Each need was represented as a vector xi = (fi,j)W j=1 where fi,j is the weight associated with term tj, and W represents the total number of terms. (b) Granularity determination. The weighted need vectors v were then used to determine the optimal number of clusters K heuristically using a technique known as cover coefficient (CC) presented by Can [38]. CC estimates the optimal cluster number as a sum of degree representing the extent to which each vector differentiates itself from other vectors. Formally, it is P h P fi;j2 i defined as: K ¼ i jx1i j W where |xi| is the j¼1 Nj length of xi, while Nj is the total number of occurrences of tj. This metric has been demonstrated to be effective in Can’s work and validated in our earlier work on requirements clustering [35–37]. For the SugarCRM data K was computed as 40.
123
84
Requirements Eng (2009) 14:73–89
Algorithm: Two-stage spherical K-means clustering Input: unlabelled instances χ = {xi }iN=1 , number of clusters K, initial centroids I, convergence condition Output: crisp K-partition P = {Ci }iK=1 . Steps: 1. Initialization: randomly initialize centroids using I:
0 K {µi }i =1 = I ; t = 0 ; (Note: improved results can be obtained by identifying centroids that are as distance from each other as possible). 2. Batch instance assignment and centroid update until convergence 2a. assign each instance x to nearest cluster i with largest 2b. update each centroid:
t +1
µi
T t x µi
t +1 = ∑x∈Ct +1 x | Ci | i
2c. t = t + 1 3. Incremental optimization of objective function until convergence: 3a. randomly select an instance from C ht and move it to cluster C rt to maximizes the gain of objective function
T t J = ∑iK=1∑x∈Ct x µi i
caused by the moving x 3b. update each centroid:
t+1
µi
t+1 = ∑x∈Ct+1 x | Ci | i
3c. t = t + 1
Fig. 5 Two-stage spherical K-means clustering algorithm
(c)
Ensemble generation. A clustering ensemble of size R was generated through performing a sub-sampling clustering of v. In each run of sub-sampling, a proportion a of the whole dataset was randomly extracted and then partitioned into K clusters using spherical K-means, followed by the classification of each remaining need into its most closely related cluster. Based on extensive experiments applied to several different document data sets [37] we determined that a quality ensemble for data sets with several thousands of data points could be generated by setting R to 200–500 and a to 0.5. (d) Construction of a co-association matrix. A co-association matrix M = (ci,j)N9N was constructed where for each element ci,j = ni,j/R, and where ni,j represents the number of times the artifact pair xi, xj was assigned to the same cluster over the entire ensemble of partitionings. The underlying assumption of using a co-association matrix is that vectors belonging to a ‘‘real’’ cluster are very likely to appear in the same cluster across multiple partitionings of the data. (e) Generation of consensus clustering. The average-link hierarchical agglomerative clustering algorithm (AHC) was then applied to generate the final partitioning [39]. from the co-association matrix, where the values in the matrix were used to represent the proximity between pairs of artifacts. Several choices exist for clustering a co-association matrix M, such as alternate variants of AHC, spectral clustering, and graph partitioning algorithms; however, the average-
123
link algorithm was chosen as it was demonstrated in our experiments to be very stable across ensembles of various sizes and characteristics. The primary themes of the SugarCRM clusters are depicted in Table 3. These themes were named according to the top five terms that occurred most frequently across all of the feature requests within each cluster. 8.2 Feature prioritization For experimental purposes we developed an answer set for a crisp clustering of the SugarCRM data. This reference set was constructed through reviewing and modifying the natural discussion threads created by the SugarCRM users. Modifications included merging closely related singleton threads, decomposing large mega-threads into smaller more cohesive ones, and finally reassigning misfits to new clusters. The results, which were reviewed and agreed upon by two members of our research team, enabled us to evaluate the quality of the generated clusters using a metric known as normalized mutual information (NMI), which measures the extent that the knowledge of one clustering reduces uncertainty of the other [37]. Normalized mutual information scores, which range between 0 (no similarity between clusterings) and 1 (identical clusterings), for the SPK algorithm returned a mean of 0.55, a max of 0.56, and a minimum of 0.515, compared to an NMI score of 0.57 for the consensus algorithm. We were unable to use NMI to compare the user defined threads with the answer set because of the disparate number of clusters caused by the singletons in the user defined threads. In related experiments, we found the clustering results for SugarCRM’s feature requests to be slightly lower than published results obtained when clustering a set of benchmarked datasets taken from the TREC repository [40]. This slight loss of quality was not surprising given the significant amount of background noise, ambiguity, and verbosity found in the feature requests. For purposes of the case study each cluster was prioritized according to the level of interest shown by the SugarCRM stakeholders. Priority scores were computed by counting the number of feature requests and/or comments per feature, normalized over the maximum sized cluster. The number of requests per cluster, and the computed priority scores are shown in columns three and four of Table 3. Resulting priorities, computed using formula #2 from Sect. 4.4, are shown in Fig. 6. In this example, the top 20% of prioritized requests were categorized as ‘musthaves’, and the second 20% as ‘nice-to-haves’. All other requests were deferred and are not shown on the graph. These results demonstrated a strong interest in two topics related to cluster #18 on {email, send, list, sugar, contact}
Requirements Eng (2009) 14:73–89 Table 3 Clusters, themes, and priorities for SugarCRM data
85
Cluster #
Cluster description
Number of requests
Priority score
C1
Calendar, view, dai, appoint, user
58
4.68
C2
Meet, calendar, schedul, call, invit
39
3.15
C3
Sugar, spam, email, filter, crm
6
0.48
C4
Sugar, us, mail, featur, email
21
1.69
C5
Instal, php, file, post, uninstal
14
1.13
C6
Databas, sql, sugar, us, data
24
1.94
C7
Modul, clone, think, want, sugar
7
0.56
C8
Sugar, busi, think, new, account
7
0.56
C9
Lead, account, contact, want, field
24
1.94
C10
Account, contact, address, wai, us
32
2.58
C11
Displai, time, calendar, hour, long
9
0.73
C12
Categori, contact, sugar, us, field
13
1.05
C13
Languag, sugar, us, hi, contact
11
0.89
C14
Sugar, us, window, pda, peopl
7
0.56
C15 C16
Sugar, portal, us, zimbra, just Imag, gif, titl, alt, class
18 19
1.45 1.53
C17
Interest, contact, us, manag, sugar
19
1.53
C18
Email, send, list, sugar, contact
62
5.00
C19
Delet, messag, email, contact, want
15
1.21
C20
Updat, mass, field, record, select
17
1.37
C21
Field, custom, studio, add, modul
43
3.47
C22
Opportun, wai, us, close, see
32
2.58
C23
Call, sale, sugar, process, contact
18
1.45
C24
Team, assign, sugar, contact, us
25
2.02
C25
User, default, role, new, set
32
2.58
C26
Googl, sugar, wai, work, calendar
9
0.73
C27
Support, cal, sugar, calendar, crm
18
1.45
C28
Forum, sugar, us, com, thread
25
2.02
C29
Sugar, crm, integr, us, version
45
3.63
C30
Sugar, us, account, modul, integr
37
2.98
C31 C32
Search, sugar, field, us, user Case, custom, support, sugar, us
33 24
2.66 1.94
C33
Print, need, sugar, us, email
12
0.97
C34
Number, phone, sugar, call, dial
21
1.69
C35
Sync, sugar, us, outlook, calendar
30
2.42
C36
Target, campaign, list, contact, email
55
4.44
C37
Recruit, modul, sugar, crm, us
34
2.74
C38
Tab, sugar, click, account, us
28
2.26
C39
Project, task, time, us, date
25
2.02
C40
Document, attach, us, account, modul
32
2.58
and cluster # 1 on {calendar, view, dai, appoint, user}. It should be noted that these topics are represented in stemmed form, therefore a term such as ‘appoint’ expands to words such as ‘appointments’ while ‘dai’ expands to the word ‘daily’. The automated tool successfully retrieved stakeholders’ requests that were dispersed across multiple threads in the original SugarCRM forum. For example 14 of the retrieved feature requests originated from singletons,
i.e., user defined threads containing only one request, 31 were found in a thread labeled ‘Email management’, 6 in a thread entitled ‘Improving Campaign Management in 5.0’, and 5 in a thread entitled ‘Forward archived emails’. Without the benefits of an automated retrieval tool, it would have been difficult to retrieve this dispersed set of feature requests from across the original user defined SugarCRM threads.
123
86
Requirements Eng (2009) 14:73–89
8.3 Target features
8.4 NFR detector
The second prioritization factor used in this case study, represented features targeted for prioritization by the SugarCRM project managers. These were discovered and extracted by reviewing the comments and interactions of one particular stakeholder who served as SugarCRM’s project administrator, and by identifying other specific themes of interest. The following goals were defined (1) provide support for distribution lists on the email server, (2) incorporate time sheets, and finally (3) improve appointment scheduling. Figure 7 shows the prioritization of feature requests according to each of these three goals. For each of these three targeted features, the top 40 feature requests were retrieved and evaluated. The stakeholders’ feature requests for a distribution list were primarily found in cluster 18 related to {email, send, list, sugar, contact}. Only 16 of the top 40 were found to be directly relevant, however, all of these relevant requests were retrieved in the top 20 items. About 14 feature requests related to timesheets were retrieved. These requests were distributed across seven different clusters, which demonstrates the value of allowing users to define their own classification themes, and use these to classify and retrieve cross-cutting feature requests. Finally, feature requests related to scheduling appointments were primarily retrieved from two clusters related to {calendar, view, dai, appoint, user} and {meet, calendar, schedul, call, invit}.
The NFR detector was also run against the SugarCRM data using the same set of indicator terms used for the IBS experiments. Candidate feature requests related to availability, maintainability, operability, scalability, security, and usability were detected. Based on an initial analysis of the results, the top 40 highest scoring feature requests were retrieved for each NFR type. Unfortunately, the terminology used to describe SugarCRM’s calendar functionality, conflicted with the indicator terms for availability, and resulted in 100% false positives for this NFR type. Therefore availability requirements were not considered further. Security and usability were found to be the dominant NFR concerns, with precision of 62.5% for usability and 37.5% for security. These results are shown in Fig. 8. Five of the security concerns were unsurprisingly found in feature 25 on the topic of {user, default, role, new, set}, while the remainder were dispersed across a number of other themes. Usability concerns were dispersed broadly across many different themes. 8.5 Global prioritization Finally various prioritization mechanisms were combined into a global prioritization query. Equal weightings of 1/3 were assigned across feature requests, NFRs, and business goals. Individual feature requests were weighted as shown
Fig. 6 Feature requests prioritized according to stakeholders’ interest level in various topics
Fig. 7 Feature requests prioritized according to three different goals
123
Requirements Eng (2009) 14:73–89
87
Fig. 8 Security and Usability concerns in SugarCRM
in Table 3. The two NFRs of usability and security were prioritized with respective weightings of three and five, and the two business goals of (1) provide support for distribution lists from the email server and (2) incorporate time sheets were assigned respective weightings of five and four. The 100 top ranked feature requests were categorized as ‘must-haves’, and the next 100 as ‘nice-to-haves’. As expected, a wide range of requirements related to the highest ranked features, business goals, and NFRs were selected and prioritized. The results, which combine the various priorities described in the previous sections, are depicted in Fig. 9. The feature requests which were individually selected according to stakeholders’ interests, project management objectives, and NFRs are all shown in the global prioritization. Furthermore, a subjective analysis of the results demonstrated that the prioritized feature requests supported the global objectives. This case study therefore covered several objectives. First, it demonstrated the scalability of our approach to manage larger sized collections of data. In fact all of the techniques described in this paper, have been shown individually to scale to many thousands of requirements, however this case study explicitly used them in a combined manner. Secondly, it demonstrated that the data-mining
techniques which we had previously applied to relatively well structured requirements specifications, could also be applied to much noisier raw requirements. This is an important finding, as this approach could be especially useful in organizing the massive amounts of stakeholders needs received by way of meeting notes, surveys, and memos. Finally, the case study demonstrated the use of semi-automated prioritization techniques for managing open source requirements forums.
9 Conclusions In conclusion, this paper has described a technique for partially automating the prioritization of requirements and raw feature requests in large scale elicitation processes. Feature requests and topics of interest can be identified either through using unsupervised clustering techniques, or through classifying requirements using pre-defined themes. These themes could include business goals defined by project stakeholders, or could be more generic concepts such as the NFRs that can be re-used across multiple projects. The approach is not designed to replace existing prioritization schemes, but is designed to automate low level
Fig. 9 Global prioritization query
123
88
and arduous categorization tasks, allowing stakeholders to work at a higher level of abstraction. For example, stakeholders might still need to create a list of higher level business goals, and prioritize these using more standard methods such as binary trees, AHP, or categorization methods; however, the automated approach means that analysts do not need to read through thousands of raw requirements to identify requests related to specific topics. Furthermore, adopting these automated techniques can facilitate much larger scale requirements processes. The case studies and experiments described in this paper have shown the effectiveness of using automated prioritization techniques for more formal requirements specifications and also less structured and noisy ‘raw’ requirements, such as the feature requests found in open source forums. All of the techniques described in this paper have been designed to scale effectively to support large projects with many thousands of requirements. The framework is likely to be more helpful when applied to projects with large numbers of unstructured requirements or feature requests. Projects which are more carefully managed and for which requirements are elicited within a framework of carefully planned releases are not likely to benefit from this approach. The limitations of the approach are closely related to the limitations of the underlying traceability, classification, and clustering algorithms, all of which are based upon data mining and information retrieval techniques that are probabilistic in nature, and that therefore do not return perfect precision or recall in the results [15, 16, 32]. However, the use of consensus clustering has been demonstrated in our recent work to generate relatively high quality requirements clusters [37]. Furthermore, the use of orthogonal prioritization techniques means that critical requirements have more than one opportunity to be detected and promoted. In future work we will evaluate the approach across a far broader range of requirements and more diverse domains. We will also explore techniques for incorporating user feedback [4, 35] into the various clustering and classification algorithms in order to improve the precision and recall of the underlying algorithms, and will investigate methods for incorporating different prioritization techniques.
References 1. Davis AM (2003) The art of requirements triage. IEEE Comput 36(3):42–49 2. Goldstein H (2005) Who killed the virtual case file? IEEE Spectr 42(9):24–35 3. Standish Group (1995) CHAOS report
123
Requirements Eng (2009) 14:73–89 4. Laurent P, Cleland-Huang J, Duan C (2007) Towards automated requirements triage. In: IEEE conference on requirements engineering, New Delhi 5. Brackett JW (1990) Software engineering. In: Proceedings of software engineering institute, 19(1.2). Carnegie Mellon University, Pittsburgh 6. Karlsson J (1995) Towards a strategy for software requirements selection. Licentiate Thesis 513, Department of Computer and Information Science, Linkoping University 7. Beck K (2000) Extreme programming explained: embrace change. Addison-Wesley, Reading 8. Wiegers KE (1999) Software requirements. Microsoft Press, Redmond 9. Leffingwell D, Widrig D (2003) Managing software requirements: a use case approach, 2nd edn. Addison-Wesley, Boston 10. Mead NR (2006) Requirements prioritization introduction. Software Engineering Institute Web Publication, Carnegie Mellon University, Pittsburgh 11. Karlsson J, Ryan K (1997) A cost-value approach for prioritizing requirements. IEEE Softw 14(5):67–75 12. Boehm BW, Ross R (1989) Theory-W software project management: principles and examples. IEEE Trans Softw Eng 15(7):902–916 13. Moisiadis F (2000) Prioritising scenario evolution. In: 4th international conference on requirements engineering, Schaumburg, pp 85–94 14. Azar J, Smith RK, Cordes D (2007) Value-oriented requirements prioritization in a small development organization. IEEE Softw 32–73 15. Cleland-Huang J, Settimi R, Duan C, Zou X (2005) Utilizing supporting evidence to improve dynamic requirements traceability. In: International requirements engineering conference, Paris, France, pp 135–144 16. Cleland-Huang J, Settimi R, BenKhadra O, Berezhanskaya E, Christina S (2005) Goal-centric traceability for managing nonfunctional requirements. In: International. conference on software engineering, pp 362–371 17. Robertson S, Robertson J (1999) Mastering the requirements process. Addison-Wesley, Reading 18. Kowalski G (1997) Information retrieval systems—theory and implementation. Kluwer, Dordrecht 19. Cutting DR, Karger DR, Pedersen JO, Tukey JW (1992) Scatter/ gather: a cluster-based approach to browsing large document collections. In: Conference on research and development in information retrieval, Copenhagen, Denmark, June 21–24, pp 318–329 20. Ertz L, Steinbach M, Kumar V (2001) Finding topics in collections of documents: a shared nearest neighbor approach. In: Text mine ‘01, workshop on text mining, first SIAM intn’l conf. on data mining, Chicago 21. Zamir O, Etzioni O, Madani O, Karp RM (1997) Fast and intuitive clustering of web documents. In: Proceedings of the third international conference on knowledge discovery and data mining, 14–17 August, pp 287–290 22. Dhillon IS, Modha DS (2001) Concept decompositions for large sparse text data using clustering. Mach Learn 42(1/2):143–175 23. Steinbach M, Karypis G, Kumar V (2000) A comparison of document clustering techniques. KDD workshop on text mining 24. Hsia P, Hsu CT, Kung DC, Holder LB (1996) User-centered system decomposition: Z-based requirements clustering. In: Proceedings of the 2nd international conference on requirements engineering, Colorado Springs, p 126 25. Yaung AT (1992) Design and implementation of a requirements clustering analyzer for software system decomposition. In: ACM/ SIGAPP symposium on applied computing: technological challenges of the 1990’s, Kansas City, pp 1048–1054
Requirements Eng (2009) 14:73–89 26. Al-Otaiby TN, AlSherif M, Bond WP (2005) Toward software requirements modularization using hierarchical clustering techniques. In: Proceedings of the 43rd annual southeast regional conference, vol 2, Kennesaw, GA, pp 223–228 27. Chen K, Zhang W, Zhao H, Mei H (2005) An approach to constructing feature models based on requirements clustering. In: International conference on requirements engineering, Paris, France, pp 31–40 28. Goldin L, Berry DM (1997) AbstFinder, a prototype natural language text abstraction finder for use in requirements elicitation. Autom Softw Eng 4(4):375–412 29. Jain AK, Dubes RC (1988) Algorithms for clustering data. Prentice Hall, Englewood Cliffs 30. Nuseibeh B (2001) Weaving together requirements and architecture. IEEE Comput 34(3):115–117 31. Cleland-Huang J, Settimi R, Zou X, Solc P (2006) The detection and classification of non-functional requirements with application to early aspects. In: IEEE conference on requirements eng., Minneapolis, MN, pp 39–48 32. SugarCRM, Product Information. Available at http://www. sugarcrm.com/crm/ 33. Cleland-Huang J, Habrat R (2007) Visual support in automated tracing. In: International workshop on requirements engineering visualization, New Delhi, India, October
89 34. Can F, Ozkarahan EA (1990) Concepts and effectiveness of the cover-coefficient-based clustering methodology for text databases. ACM Trans Database Syst 15(4):483–517 35. Duan C, Cleland-Huang J (2007) Clustering support for automated tracing. In: IEEE international conference on automated software engineering, Atlanta, Georgia, November, pp 244–253 36. Duan C (2008) Clustering and its application in requirements engineering. Technical report #08-001. School of Computing, DePaul University 37. Cleland-Huang J, Berenbach B, Clark S, Settimi R, Romanova E (2007) Best practices for automated traceability. IEEE Comput 40(6):27–35 38. Denne M, Cleland-Huang J (2004) The incremental funding method, a data driven approach to software development. IEEE Softw 39–47 39. Jain AK, Dubes RC (1988) Algorithms for clustering data. Prentice-Hall, Englewood Cliffs 40. Duan C, Cleland-Huang J, Mobasher B (2008) A consensus based approach to constrained clustering of software requirements. In: Accepted at ACM 17th conference on information and knowledge management, California, October
123