Higher Education Policy, 2012, 25, (361–379) r 2012 International Association of Universities 0952-8733/12 www.palgrave-journals.com/hep/
Changing the Peer Review or Changing the Peers — Recent Development in Assessment of Large Research Collaborations Finn Hansson and Mette Mønsted Department of Management, Politics & Philosophy, Copenhagen Business School, Porcelænshaven 18A, DK-2000 Frederiksberg, Denmark.
Peer review of research programmes is changing. The problem is discussed through detailed study of a selection process to a call for collaborations in the energy sector for the European Institute of Innovation and Technology. The authors were involved in the application for a Knowledge Innovation Community. Through the analysis of the case the article discusses the role of researchers acting in their new role of reviewers in situations where a number of important decision-making dimensions are cut out from the formerly direct influence on the researcher’s quality assessment. In connection with this discussion the article will provide input to a critical review of the use of quantified scaling developed to systematize a quality assessment on the top of the peer review-based assessments. In addition the article discusses the challenges to the peer review system in Mode 2 type science and in cross-disciplinary research collaborations. This is a contribution to the assessment of new roles of peer reviewers in large policy decision-making systems like research funding, and an input to further discussions. Higher Education Policy (2012) 25, 361–379. doi:10.1057/hep.2012.17 Keywords: peer review; research evaluation; indicators; research quality; measurement
Introduction Policymakers and civil servants in the European Commission are increasingly concerned with how to stimulate economic growth and maintain welfare. Specifically, the European Union (EU) seems to be weak on successes in converting advanced technological research into innovation in the EU, especially when compared with the US and Asia, have occasioned a range of experimental policy interventions in the traditional quality assessment process. Not only do companies and universities here face their own particular difficulties (Leifer et al., 2000), the interests of society and industry are not necessarily identical to those of researchers. The point of departure for this article is a case of specific and highly detailed science policy intervention in the traditional quality assessment process. Here, scientists perform their
Finn Hansson and Mette Mønsted Reviewing Science for Innovation and Value for Society
362
assessment within the policy-designed framework of a quantified assessment system. The interest in innovation, rather than merely technological research, is evident. But the evaluation of projects that might serve this interest is not an easy task. In the case reported in this article it became clear in the process of writing the application and negotiating contracts with industry, that industry had a different and broader agenda than collaboration on innovation in green energy. Industry expressed demands for more and better graduates, and also technically talented people, who can also think in terms of innovation and value of technology. That is, some of those involved had some business skills. This would of course make the graduates more employable, and make it easier to work with innovation processes. But it remains a challenge to reach across the various disciplinary and institutional boundaries, bring divergent cultures together, and interfere in the autonomy of universities. A problem that is often met in this type of collaboration between the technical culture and the business culture1 is the attitude of technical researchers and professors, who often perceive business school competences for bringing technology to the market as marginal and something ‘you leave to the marketing people’. This attitude makes it difficult to integrate the idea of innovation in research projects, even where the technical universities declare that this competence is sought (and present) in their graduates. The question we want to discuss in this article relates to the impact on the role of the classic and by nature a bit conservative (discipline-bound) quality control by the peer review system under these changing conditions. Olbrecht and Bornmann (2010) have reviewed a number of studies of the roles panels play in grant application decisions in cross-disciplinary fields. Their focus is on the social psychology of the decision-making processes in the panels and in situations where all members of the panels are peers. Our contribution has a slightly different perspective. We develop the problems raised by Chubin and Hackett (1990) and ask what happens when the role of the peer specialist/ researcher is restricted to the preparation for the decision and not as a participant in the final decision. A central and pressing question remains: Can peer review, when integrated into still larger policy decision-making system, be reformed in order to accelerate innovation and still be accepted universally as the only quality control system in science or will the expansion of such decision models undermine the peer review system?
Peer Review and Scientific Quality Control The well-established quality control system in science, the peer review system, has always had two intertwined functions. First, it is to select the highest Higher Education Policy 2012 25
Finn Hansson and Mette Mønsted Reviewing Science for Innovation and Value for Society
363
quality work; second, it is to support the legitimacy of the outcomes (the assessment) in the scientific community. The first is ensured by using qualified peers in the selection process, the second, by communicating a certain transparency in the assessment. It is by itself rather astonishing that it has worked without major criticism (Ware, 2008) this way for several decades. Today, however, the introduction of large-scale research collaborations, both trans-disciplinary and cross-disciplinarity and with industrial partners, or what some have called Mode 2 research (Gibbons et al., 1994; Nowotny et al., 2003; Hessels and van Lente, 2008) or Triple Helix (Etzkowitz and Leydesdorff, 2000), clearly challenge the traditional evaluation system. This challenge is embedded in a European policy discourse about the role of university research in innovation and economic prosperity. The emerging consensus in this area legitimates intervention in research-based decision making by a number of new actors, producing a whole new setup for the quality assessment process and a challenge to what is left of the autonomy of university researchers (Ernø-Kjølhede and Hansson, 2011). The peer reviews — the unchallenged ‘gold standard’ of scientific quality assessment for more than two centuries — are increasingly integrated into the new decision-making systems as in the case discussed, and our concern is about the consequences of the framing of the peer review system when it is used as a part of the decision making (Chubin and Hackett, 1990). The peer review acts as the guarantor of scientific quality and we can observe changes in the relationship between this assurance system and the decision-making system in the area of science and technology policy. This is an important issue. We find ample evidence to suggest that such changes occurred in a more regular or systematic way, which has implications for both the scientists who apply for funding and the reviewers who assess the applications, perhaps indirectly, also for the institutions using the quality systems. When major changes in the function of the peer review take place, it is important to discuss the short- and long-term implications for both the scientific applicants, the reviewers and no less importantly, the various users. The assumption of scientific quality as the sole criteria used in the review has produced a special kind of transparency regarding the foundation of the assessment, one that has been accepted by most scientists for centuries and is accepted even today (Ware, 2008), severe criticism notwithstanding (Cicchetti, 1991; Cole, 1998; Bornmann, 2008, 2011). Most of the time, the peer review process by itself is a semi-transparent process, but increasingly we see how the review is framed by specific instructions about what to focus on during the review. We are not here talking about the familiar instructions from the editor on how to report the conclusion, but much deeper directions about how to focus the review and how to report (sometimes in a schematic form with quantified scales) the conclusions, as well as assessing the potential of the project for the future of Higher Education Policy 2012 25
Finn Hansson and Mette Mønsted Reviewing Science for Innovation and Value for Society
364
research. The role of reviews in the policy process is also changing. How are the reviews used and how do they influence the subsequent decision-making process? Who participates in the selection process and who ultimately decides? Short-listing and selection play an important role in the present case and we anticipate a growing use of this and other often non-transparent systems. In some cases the decision process may easily be interpreted as a challenge to the legitimacy of the researchers involved in the peer process. Chubin and Hackett’s (1990) well-known study of the application and selection processes of the two largest federal funding agencies in the US (NIH and NSF)2 introduced some new organizational and policy-oriented perspectives on the ongoing discussion of the role and function of the traditional peer review-based system of quality control in science. Their study focussed on the peer review of the contributions (project proposal, articles) by the individual researcher, a problem that has been discussed ever since Merton’s (1968a, 1942) writings on the role of norms (CUDOS3) in the cumulative creation of knowledge in science and the weaknesses of the system suggested by the Matthew effect (Merton, 1968b). Chubin and Hackett reviewed a number of studies, including Cole’s well-known (Cole et al., 1981; Cole, 1998) empirical analysis of the reliability of the peer reviews used by NSF by re-evaluating a number of applications, but they also did a survey of applicants to NIH and tried to locate the effect that approval or disapproval had on scientist’s later career. By placing the scientific peer review system in an organizational and institutional frame, they discussed some of the effects and functions of the peer review in relation to policymaking, primarily in a number of different science funding systems. At the time of their study both the NIH and the NSF had undergone some major changes in the procedure. They had gone beyond the evaluation of purely scientific merit by, first and foremost, setting up independent advisory councils manned by science-policy managers and officers. ‘Such criteria of utility’, they argue, ‘make great sense in these pragmatic times, but they reflect a sharp change from the original “contract” between science and society envisioned by Vannevar Bush in 1945’ (Chubin and Hackett, 1990, 9). The change or introduction of specific criteria in the review process points to a number of new and quite critical problems in relation to the whole selection process and its function in relation to researchers. The changes question the traditional selection process based on the use of peer reviews in relation to decisions on projects and grants, because negative reviews often had a decisive effect on the outcome of applications and the future of individual researchers when declined. Also the well-known fact that reviewers often disagree (Cicchetti, 1991; Bornmann and Daniel, 2009; Olbrecht and Bornmann, 2010) or are unstable in the reviews over time (Cole et al., 1981; Cole, 1998) should be a warning against the unquestioned reliance on the reviews in Higher Education Policy 2012 25
Finn Hansson and Mette Mønsted Reviewing Science for Innovation and Value for Society
365
decision making. But as long as the peer review process remains transparent and generally accepted (Ware, 2008) by reviewers and scientists, the disagreement can be discussed as a scientific one. What Chubin and Hackett (1990) documented in their study and warned against were the consequences of unplanned changes in the quality of the reviewers of grant decisions by using a different type of ‘peers’, who were too often acting outside their own area of expertise or who were not specialists in the specific field in question but generalists or science policy officers, because this undermines the basic idea of selecting only projects of the highest possible quality.4 The peer review role changes when peers are not specialists within the field, and especially if very different disciplines are involved, often giving a stronger voice to peers who seems to be closer to the field in question, thereby weakening the crossdisciplinary parts (Lamont, 2009; Lamont and Huutoniemi, 2011). Attempts to use more formal data to evaluate research quality like citations or measures built on bibliometrics will most often collide with the fact that, used in relation to individuals or small numbers individuals, the well known L-curve (Seglen, 1997) between number of scientists and number of citations received will produce non-valid knowledge from a statistical point of view. To this add, as Starbuck (2005) has demonstrated, there is no necessary relation between high impact journals and the number of citations to individual articles in these journals. Even if these methods are easy to use and economical, the use does not seem to be a solution to the problem, the growing costs in evaluating research quality. In a large empirical analysis of the operations of the Netherlands Economic and Social Science Research Council (van den Besselaar and Leydesdorff, 2009) the authors are able to demonstrate the functions and limits of the peer review system in selecting the quality of research proposals. Besides demonstrating that quantitative indicators (citations) do not give a better prediction than peer review-based selection, the article points to existing knowledge of bias and distortion in the selection process, which in this case is solely limited to disciplinary-based quality criteria. The author concludes that instead of fruitless efforts trying to improve procedures (and statistical indicators) for selecting individual projects, ‘the main issue is to ensure that the system works properly despite uncertainties’ (286). The introduction of criteria like utility, relevance (and merit), and cross- and trans-disciplinarity in the decision-making process tends to isolate and encapsulate the traditional peer review, which if we follow Merton, had focused solely on discipline-based scientific quality (Merton, 1968a; Bornmann, 2011). Instead, a new conflict between scientific accountability controlled by the scientific community through peer review, on the one hand, and the politicaladministrative decision-making systems, on the other, which take utility and relevance into account, slowly evolved. We can also see the contours of new Higher Education Policy 2012 25
Finn Hansson and Mette Mønsted Reviewing Science for Innovation and Value for Society
366
strategic roles for the researchers acting as reviewers. A number of studies (Langfeldt, 2006; Lamont, 2009) have already shown that as soon as we move away from traditional disciplines in the peer review system, the assumption that the assessment is based on scientific quality alone cannot be upheld. In cross- and trans-disciplinary peer review committees, decisions on quality is the result of negotiation, where reviewers who are relatively foreign to the area reviewed tend to accept the review of the scientist closest to the area. This is in fact rather close to the warning by Chubin and Hackett (1990) intoned many years earlier. The ongoing discussion in the UK on the implementation of a new research evaluation system (REF) instead of the now more than 15 years old RAE contain some of the same problems because a new dimension, ‘societal impact’, has to be evaluated by new actors, not necessary scientists (Ernø-Kjølhede and Hansson, 2011). In this article we will discuss the abovementioned changes in the peer review system. Other performance measures will probably have to be considered in order to shift the focus from pure university-based research toward the innovation and application of knowledge. The use of bibliometric measures, like citations and H-indexes, and the related focus on international journal articles are becoming even more important, not only for assessing individual researchers, but also for assessing departments and universities for basic research funding, even though bibliometric measures chart past performance. These processes make it more difficult to motivate the work necessary to complete new kinds of collaborative applications and practice-based innovations, as it should be evident that retrospective bibliometric measures cannot capture real innovation. Only if we are lucky, may they reveal competence systems that might be able to produce innovation. This implies a paradox in the policy field. On the one side, decision makers want more classic disciplinary science, using journal articles as the hallmark of high-quality research. On the other hand, they want to invoke other performance criteria related to the application of the research for the benefit of society. The latter, however, is not really measured or used for comparing universities (Ernø-Kjølhede and Hansson, 2011).
Justified and Argued Belief Quality assessment in the academic world has been and still is primarily based on the system of peer review, but we can observe tendencies to either supplement the peer reviews with other systems or to limit their role. The use of secret balloting, as in the case to be analysed, is one rather radical means to decide the issue when facing a number of conflicting interests. It Higher Education Policy 2012 25
Finn Hansson and Mette Mønsted Reviewing Science for Innovation and Value for Society
367
might appear almost scandalous, as it reduces the role of the quality assessment by the peers to only one of the elements in the decision making. But half a century after Lindblom’s ‘The Science of “Muddling Through” ’(1959) and, 40 years after the ‘garbage can theory’ (Cohen et al., 1972), the idea of a fully transparent decision-making system should perhaps be only a relic of a bygone age. In this situation, the authority of the scientific quality assessment and the need to include societal and policy demands in the final decision created a new situation. The EU Commission and other research grant foundations, however, seem unable to challenge the existing scientific and discipline-oriented quality assessment system (the peer review) when it comes to include relevance and societal impact in the overall assessment. HEFCE in the UK has opened this problem by including not only scientific quality but also including societal impact and quality of research organization in the overall assessment of UK Higher Education institutions by the new REF 2014 model.5 There are attempts to promote alternative or new assessment criteria in the overall assessment systems, where societal and policy demands have an open and legitimate role together with the scientific quality assessment. The result is a situation where the peers in reality are taken hostage by administration and policymakers. They are asked to review in an unknown context and the decision makers have to live with suspicions of foul play, specifically because the legitimacy of academically grounded assessments are replaced by balloting that is not connected to explicit arguments for and against particular projects. In Europe, the criteria for competing interests are developed in order to find relevant, competent people without a specific bias, but it may be more difficult in such large-scale European projects to identify the ‘interests’ as their scope broadens and comes to include national and organizational interests.
Peer Review and the Need for Reviewing Innovations As already mentioned, there is an on-going discussion between reviewers on how to define and decide on criteria, both in well-defined disciplines and in cross-disciplinary studies. The value of the reviewers is evident in both, but there is a possibility that reviewers in cross-disciplinary fields will tend to disagree more often than if they are from similar disciplines (Langfeldt, 2006), or that they will tend to lean on a dominant disciplinary discourse (Laudel, 2006; Lamont, 2009). Peer reviewers assess according to both discipline and research field, which means that some reviewers often have highly developed cross-disciplinary skills within a research theme. In these studies, conflicts and disagreements between peers play an important role in the decision but it is important to notice that the outcome is the result of a process played out Higher Education Policy 2012 25
Finn Hansson and Mette Mønsted Reviewing Science for Innovation and Value for Society
368
between the peers alone, even if they are from very different disciplines. Introducing elements like innovation and entrepreneurship to the disciplinary mix does not make the decision-making process easier. The problem is how the selected peers evaluate discrepancies between the social and the technical aspects of innovation and entrepreneurship in, say, sustainable energy. The technical research has to be possible and advanced, which is easier for the technical reviewer to assess, but questions of whether and how to make this into an innovation that survives the market may not lay within his/her field of expertise. In scientific committees, deciding on funding is a negotiation process where different peer opinions are challenged in order to reach a decision. When disciplinary fields are contested and complex (Whitley, 2000) or crossdisciplinary, the decision making is complex as well, because peer reviewers relate the assessment to a scientific discourse of existing science and its disciplinary structure. Assessing the potential for innovation and the necessary management and organization is much harder to do because, whereas the scientific quality assessment is based on performance and most often refers to an existing paradigm or tradition with the same kind of criteria or the same discourse, the field of innovation cannot be assessed by its place in a paradigm or discipline but only on its potential. This creates a very different issue and a different role of reviewers. We will in the following case demonstrate that the fear of eroding the peer review system, first expressed by Chubin and Hackett (1990) some 20 years ago, has manifested itself in the existence of much more elaborate and formalized systems, where qualitative peer judgements are forced into quantitative decision-making systems based on a Likert scale.6 The kind of solution to this problem found in the case presented in this article is to create a proxy for the peer review system. The proxy aims at upholding the scientific legitimacy and authority normally associated with the peer review system but in such a way that the peers are asked to review in very restricted dimensions and to express the value on a Likert scale. The dimensions are used as additives, that is, all criteria have the same value. But the content, originality and the balance between the dimensions are not assessed as the fundamental and necessary condition. Traditional scientific quality assessment then count for less than half of the total of 100 points, and the rest is for organizational qualities, for managerial issues like business plans and alike.7 The use of metrics in evaluation of science and technology has been discussed by Kostoff and Geisler, who after reviewing a number of studies conclude that: The complexities of selecting, employing and interpreting S&T metrics invariably lead to the creation of intended and unintended consequences to S&T-performing organizations. y We therefore recommend that before implementing specific metrics for application to any part of the S&T development cycle, the organization should identify and evaluate Higher Education Policy 2012 25
Finn Hansson and Mette Mønsted Reviewing Science for Innovation and Value for Society
369
the intended and unintended consequences from the implementation of specific metrics. In addition, the organization must identify the impacts of these consequences on the strategic goals and the core mission of the organization, and to appropriately adjust and correct for such impacts. (Kostoff and Geisler, 2007) We know that governance models for collaborations in the area of sustainable energy research are complicated and have to reflect their context and purpose (Hansson et al., 2009). We also know from this project that the collaboration has to be based on personal relations and mutual trust, and that this is not formed easily in a standardized way. From the field where most research has been done on the collaboration between industry and universities, namely, the pharmaceutical industry, there seems to be very different management models for research and for development of the innovation component of the process (Chiesa and Manzini, 1997). Whereas the research collaboration is personal and open, the development part is very structured and tied to very specific contracts (ibid.). In this design for the application, it is much more difficult to create this kind of flexible and adaptable organization and governance. What can be assessed are reflections on how to create management models, but this assessment can be conservative and based on other experience or myths and fear of specific models such as, for example, Networks of Excellence. The design of the matrix structure, the decision-making structure and the power hierarchy is highly uncertain, and it will mainly depend on the CEO recruited, who cannot be recruited before the programme is selected. In assessing these dimensions, therefore, arguments tend to be expressions of beliefs or opinions. ‘We do not believe this structure will work’ or ‘the most important is that the structure is simple’. But how can the structure be simple in a cross-disciplinary, cross-country, university-industry collaboration with more than 15 partners? Arguments here are as often grounded in myth as hard fact. The dilemma is visible between costly and time consuming peer review procedures vs more simple and formalized quantitative data-based decision systems in order to select funded programmes and projects. On the one hand, it is important that decision making in selecting programmes for funding and other support is fair, but often burdened with costly and time-consuming procedures; on the other, if formalization reduces the qualitative input to and simple measures for the final assessment, we not only risk selecting average or mediocre programmes but also risk undermining the authority of scientific quality. Traditional decision models based on numbers are not necessarily good for developing new, innovative and entrepreneurial cultures. This, then, is an area of management studies that could fruitfully be developed by drawing on research in the field of science and technology studies. Higher Education Policy 2012 25
Finn Hansson and Mette Mønsted Reviewing Science for Innovation and Value for Society
370
The case selected is characterized by a number of new challenges: It is a cross-disciplinary case between a technical field combined with skills in innovation, entrepreneurship, business and management. It is a huge project that cuts across the boundaries between university and industry. It is an educational programme focusing on innovation and entrepreneurship education for engineers, and not on engineering research. The call is exceptionally detailed on criteria and quantification of qualitative data. It is the first attempt to set up a selection procedure in order to select large projects in a new area of sustainable energy combining research collaboration, innovation and knowledge dissemination by an independent EU body (EIT), and both authors had direct access to material from the process, both from preparing application as part of a pilot programme (Success) and from following the applications through the different levels of assessing or evaluating the application. In a sense we follow the idea of self-ethnography described by Alvesson (2003) using our contacts and network to collect and not the least to evaluate the information needed in order to analyse the decision-making processes. The case could also be labelled as an instrumental case study using Stake’s (2001) terminology, because our interest is to use the case to demonstrate some key problems in the evaluation of science and research-based activities. By direct participation in the application process we have insights and access to information, which could otherwise have been difficult to get. Not necessary because of secrecy but because of the enormous amount of documentation this large-scale EU programme contains, where it can be extremely difficult to locate key documents without prior knowledge of the programme. Our closeness to the process and participation in one of the two selected applications could of course result in a ‘go native’ (Alvesson, 2003, 172) attitude toward the process, but in order to keep some distance we postponed our analysis of the material until over a year after the decision and second, discussed the selection and interpretation of the material with departmental colleagues who has no connection to the Knowledge Innovation Community (KIC) programme. Following this approach we feel that we have reduced the elements of bias in our analysis of the interesting and novel model for selecting large-scale science-based projects.
Case: Application to the EIT8 Board for a KIC In this case we will discuss the consequences of integrating the peer review system into a large and complex evaluation process. As an attempt to boost Europe’s ability to convert technical research into innovations and products in order to create value for society, the European Commission decided in 2007 to Higher Education Policy 2012 25
Finn Hansson and Mette Mønsted Reviewing Science for Innovation and Value for Society
371
create a new organization, the European Institute of Innovation and Technology, to organize frameworks for new programmes focussing on innovation. A note on method and data collection is necessary. The case was not originally designed as a research case but was what might be called a ‘living’ case in the sense that we were part of a large collaborative application for a KIC. Drawing on experience from our participation in preparing the application and following it through the final decision-making process, we stumbled upon problems and dilemmas that seem to be relevant and important in the handling of large-scale research applications, both in the EU and in national contexts. Therefore it was necessary for us to rely on documentation and memory in order to reconstruct the processes afterwards, instead of collecting information systematically during the process by interviewing other participants. It is, of course, possible to question the validity of our claims, but we will make clear where the claims are based on direct participatory observations and where they are based on documentary material. Four pilot projects for the later EIT process were initiated within different technologies in 2008: sustainable energy, nanotechnology and electronic systems, green transportation, and nano-medicine. These pilot projects have as a new element a social science or business economy angle. The SUCCESS9 project included two business schools: CBS in Copenhagen and ESADE in Barcelona, which were involved at a late stage. Technical universities and industry were involved in the pilot projects so that this could be the first step to the next round of applications for the KIC, the next prestigious programme for technical universities. The pilot projects became platforms for networks to form groups for applications for the KIC. The first presentation at the European Institute of Innovation and Technology stressed that the focus was not technology or technical research, and in fact emphasized that funding was not going to technology, but to innovation and entrepreneurship to secure value of the technical research. In this case study we will present the criteria in the call for the three themes: sustainable energy, information technology and climate, as these are important for the argument of the changing role of peer reviewers. The application for a new KIC was described in great detail, stating the amount of pages to be used to describe the different areas in the application as well as the rather complicated procedure with several levels of decision making. It should be noted that the whole application is constructed in such a way that it can only be evaluated by a mixture of qualitative and quantitative evaluation systems. To illustrate the level of detailed instructions concerning the content, this content in the different parts of the application are described: Part B — (indicative length: 20 pages); this part is the body of the proposal and describes the KIC, what it wants to accomplish, its strategy, Higher Education Policy 2012 25
Finn Hansson and Mette Mønsted Reviewing Science for Innovation and Value for Society
372
activities, work programme and processes, business plan (including provisions for sustainable and long-term self-supporting financing), its co-location plan, its IPR plan and the plans for dissemination of the KIC’s activities and innovative education models; Part C — (indicative length: 15 pages); this part addresses the credibility of the proposal, particularly in terms of the organization, management and governance of the KIC, the individual capability, capacity and international reputation of the partners and how this is integrated into a coherent world-class innovation chain from education through to economic and societal impact. (KIC InnoEnergy, 2010) The demands for documentation in line with the details demanded under B and C is overwhelming and complex, partly because it deals with a new and complex organizational construct, selecting the flagship for the new institutional arrangement for supporting innovation and entrepreneurship across Europe, using new tools. The new board for this institution, EIT, is independent in relation to the Commission and extra criteria were added from the board. The call contains a detailed description of the process and criteria of the evaluation but the overall structure is a limited number of pages describing the contribution in the stated areas; the substantial chapter B with the body of the project is allowed 20 pages, while the credibility and governance part is set to 15 pages. The evaluation will then summarize the total of the content in B and C with up to 20 points for content and 20 for novelty for each section, and this is out of a total possible score of 200. The main idea in this section is to evaluate the projects on content, the tools and the performance criteria, as announced in the application call. The performance criteria are heavily related to business criteria and not to how universities normally evaluate these activities. An evaluation group with competence in the technical field performs the evaluation and the scoring on step 1. The five highest-scoring proposals from each priority area continue to the next step of the evaluation process. In this the second step of the assessment, the evaluation still works with scaling and scoring and there is a maximum of another 100 points, and the areas to be evaluated are all on management, commitment, organization and partners. The final score of a proposal is the sum of the scores for steps 1 and 2, maximum 200. The three top scoring proposals from each priority area will be examined by an independent panel of experts (KIC InnoEnergy, 2010). In the section on management and partners, emphasis is placed on setting up legal units across universities, research institutes and industry across countries in order to have a uniform management structure for assigning responsibility, Higher Education Policy 2012 25
Finn Hansson and Mette Mønsted Reviewing Science for Innovation and Value for Society
373
but in the real world of complex local, state and EU regulations this is not an easy task, and none of the applicants could do this at the time of application. Indeed, it is simply not possible for most of the involved institutions to form a joint legal unit.
Assessment by a Final Selection Expert Panel At the end of step 2, a final panel will examine the three top-scoring proposals from each priority area (nine proposals in total) together with their associated evaluation reports. The final panel will prepare a report for the EIT Governing Board. It is expected to: adjust, if necessary, the final scores to ensure consistency between the three priority area evaluation panels; make recommendations concerning the selection of the designated KICs; make recommendations concerning the way in which the selected proposals need to be improved or strengthened. The final panel ‘adjusts if necessary the final scores’. This is interesting in an assessment procedure, because there are no written indicators of the conditions under which the final panel can ‘adjust’ the evaluation of the scientific quality considering societal and policy demands. First the procedure has forced scientific peers to formulate their assessment in numbers, using a Likert scale, and then an expert panel, which will not have access to detailed knowledge on all areas of all projects, has the opportunity to reshuffle the numbers assigned by the reviewers. By creating such a grey area around a traditional and respected quality assessment system (the peer review) the process introduces a number of new political problems with process transparency and opens for suspicion of ‘foul play’ by outsiders and de-legitimization of the scientific quality as basis for selection (van den Besselaar and Leydesdorff, 2009). The use of Likert scaling at all levels of the assessment procedure in order to be able to summarize the assessment in a few numbers is a tempting way to simplify the process. But the Likert scale technology should not be put into use so easily, considering the immensely complicated dimensions and different types of data involved to quantify qualitative data. Scaling and associated numbering, leading to a hierarchical differentiation, does not by itself solve the problem of balancing the scores. The uncertainty remains, but is now less visible, hidden behind numbers. How much is actually technology and how much is innovation and entrepreneurship, or a business plan? How is the balance between these to be assessed, and by whom? Such a procedure cannot substantiate the claim that the best scientific quality will win. Higher Education Policy 2012 25
Finn Hansson and Mette Mønsted Reviewing Science for Innovation and Value for Society
374
The many rule-specified areas for the assessment of proposals and the weights associated with the dimensions gives a probably willed impression of seriousness and objectivity. It is a bit strange considering the many on-going discussions in Europe on how to evaluate societal impact of all types of research, on the usefulness of applied research, on the strengths of research organizations, etc. We only have to take a look at the English discussions of the proposal from HEFCE to replace the RAE (HEFCE Research Excellence Framework, 2009). The new system is to enhance the research evaluation system with societal dimensions as well as a measure of the strength of research organizations will be assessed by quality profiles produced by expert reviews (REF 01.2012) and not by introducing the use of scaling systems. Two projects in each group of the three themes were shortlisted after procedure by the selection panel, and invited to present their case to the board in 20 min. Then the board voted by secret ballot for one of the two. Where selection is done in secret, it implies that members do not have to argue for their vote or their selection between different quality elements, and this creates uncertainty about the criteria in the final decision. The composition of the board therefore becomes crucial. The national priorities for these KICs are seen as the projects that are going to set the agenda for research in the coming years. There has been heavy lobbying to contact members of the board throughout the process. In other national and international boards, the voting is not secret, but the final decision is of course regarded as the decision of the board. Nobody is expected to reveal the voting of different members. But going to the extreme of secret balloting implies that industry and national interests are to be more easily protected, and it appears as just one more step to the political and non-transparent decision process. The review process in cross-disciplinary and Mode 2 research is complicated, as it has been demonstrated in several studies; the selection of primary and secondary projects should be open for negotiation by expert reviews and not decided in advance. In the case, criteria seem to be clear on management, innovation and entrepreneurship, but unclear in regard of scientific quality and especially what role it should play in the final decision making. The role of peer reviews and the background and reference points for what type of reviewers should be used remains unclear, as the call emphasizes innovation and entrepreneurship, and whether the reviewers are coming from relevant scientific fields is not clear. In relation to reviewers, this is a challenge, as technical reviewers may assess the academic and technical content. But then a very important part of the assessment is based on the assessment of the business plan, the management and organization, rather than the scientific-technical content. This raises the question of the selection of academic reviewers and their roles and competence. If the most important part of the review is based on core competences for Higher Education Policy 2012 25
Finn Hansson and Mette Mønsted Reviewing Science for Innovation and Value for Society
375
management consultants, then does this leave space for other types of reviewers? This, as well as the extensive use of scaling, in the assessment may raise serious doubt about the existing system of scientific reviewing processes.
Perspectives The classic peer review system has been an integrated part of the quality selection system in universities and higher education for centuries. The evaluation or outcome of peer review processes still commands a high level of trust and confidence among scientists worldwide and whose use is uncontroversial. The flexibility of the peer review system has made it adaptable in many different situations in universities, higher education institutions and science policy, and the system has shown its resistance to change over the years. Later the use of the peer review system was extended to evaluation of institutions and research proposals changing the framework of the peer review, sometimes defined as strong evaluation systems (Whitley, 2010). But success has its price. Rising demands for accountability and visible results for public investments have also reached national as well as EU research and innovation funding bodies. Together with the rapid growth of external funding of university research and demands for cross-disciplinary research and collaboration with industry in national and EU research funding, it has placed much stress on the classic peer review system because of time, costs and difficulties in finding reviewers. One answer to this dilemma can be found in national as well as EU funding bodies where we can observe the opening up of a number of quantitative evaluation systems with data from the bibliometric citation systems. The new quantitative evaluation models use either combinations of peer revaluations and specific scaling systems or different calculations provided by the commercial providers (impact factor). In this article we have analysed the use of a special combination of the flexible peer review system and a quantitative Likert scaling system, used in order to make funding decisions. What we have seen are tendencies to decoupling the expertise by peers and a strengthening of the decision-making power of the research officers in the funding body, in charge of organizing the calculation systems. Over 20 years ago Chubin and Hackett (1990) observed the use of research administrative officers in the final decision-making process. Today, we have a system where the roles of peers in many situations are reduced to producing input for the decision makers. We assume that this tendency will have a lasting impact far beyond the specific funding system. One is that researchers will adapt too easily to these new demands, because we have seen how fast researchers adapt to new evaluation systems in connection with research Higher Education Policy 2012 25
Finn Hansson and Mette Mønsted Reviewing Science for Innovation and Value for Society
376
funding, that is, as observed in development of the RAE system (Geuna and Martin, 2003; Martin, 2003; Barker, 2007). The other is the tendency to undermine the role and status of the peer review system, because the growing use of the scale-based systems will reduce the areas where peer review will be used, and scientists will observe the rise in the use of quantified evaluation systems. These are possible tendencies that, we might assume, may or may not develop. What is more critical as seen from the standpoint of selecting the best possible research is what Merton (1942) once called the unintended consequences of well-designed projects. Setting up scale-based measurement systems to evaluate new ground-breaking research tends to define beforehand what can be measured, and this will have consequences. What is at risk is the possibility of giving space for new breakthrough research, or what Kuhn (1970) called paradigm shifts, as the quantified scaling systems tend to favour incremental changes in research, not revolutions, and to be more accessible for normal research agendas and less for alternatives. By presenting our case of decision making for research funding based on a quantified scaling system, where peers have no or a very reduced role in discussing the outcome, we hope to push further the discussion of the role of peer reviewers in cross-disciplinary and collaborative projects and what role quantitative measurement systems should play. In cross-disciplinary projects new competences and new combinations of knowledge are relevant, and it has to be considered, how the role of the traditional academic peer reviewers are changing, and what new systems of combined reviewers could be developed. But we also hope to raise a discussion of the closed processes of quantifiedbased decision making and secret ballot in relation to research funding, which is foreign to the justification of the arguments in the assessment of classic peer review and novel in higher education. At the beginning of the article we formulated the question: Can the peer review, integrated into an ever larger policy decision-making systems, be reformed in order to accelerate innovation and still be accepted universally as the only quality control system in science and higher education institutions, or will expansion of such decision models undermine the peer review system? And we will close with a partial answer. To our mind the Likert scaling decision model we have seen in the case will hamper real breakthroughs in science and innovation. Any revision of the process must respect an open dialogue with the expert reviewers.
Acknowledgments The authors thank Thomas Basbøll, CBS for his comments to the final version. Higher Education Policy 2012 25
Finn Hansson and Mette Mønsted Reviewing Science for Innovation and Value for Society
377
Notes 1 C. P. Snow opened up the discussion on the conflict between humanities and the natural sciences in 1959 in his famous Red Lectures (Snow, 1993). 2 NIH — National Institute for Health and NSF — National Science Foundation. 3 The CUDOS norms describing ideal norms for scientists behaviour in order to be prompted the best possible science results were developed by Merton in the wake of rude state impact on science before WW2 in Germany and the USSR (Merton, 1938, 1968a, 1942). 4 A recent survey of more than 4,000 researchers’ view on the peer review system (Ware, 2008) disclosed an unexpected large support for the classic system in the light of the often hard critique from researchers and policymakers — in the survey the researchers acknowledged the criticism raised at the peer review system (bias etc.) but chose to support it nevertheless. 5 http://www.ref.ac.uk/ (accessed 22 May 2012). 6 From historical studies of the geneses of standardized quantification systems we know how difficult and time consuming quantification of former qualitative processes can be (Porter, 1992). 7 At step 2 the ‘Management, governance and organization of the partnership and co-location, covering also financial and legal aspects of the KIC’ cover 50 points of 100, and the other 50 are from combined strength of partners. At step 2 the content and originality is gone, and only financial, legal and management issues are at stake, but how is this assessed? That is the content and novelty do not count in the second round. 8 European Institute of Innovation and Technology, an independent institution under the EU. 9 SUCCESS¼Searching Unprecedented Cooperations on Climate and Energy to ensure Sustainability.
References Alvesson, M. (2003) ‘Methodology for close up studies — struggling with closeness and closure’, Higher Education 46(2): 167–193. Barker, K. (2007) ‘The UK research assessment exercise: the evolution of a national research evaluation system’, Research Evaluation 16(1): 3–12. Bornmann, L. (2008) ‘Scientific peer review: an analysis of the peer review process from the perspective of sociology of science’, Human Architecture 33(2): 23–38. Bornmann, L. (2011) ‘Scientific peer review’, Annual Review of Information Science and Technology 45: 199–245. Bornmann, L. and Daniel, H.D. (2009) ‘Reviewer and editor biases in journal peer review: an investigation of manuscript refereeing at Angewandte Chemie international edition’, Research Evaluation 18(4): 262–272. Chiesa, V. and Manzini, R. (1997) ‘Managing virtual R&D organizations: lessons from the pharmaceutical industry’, International Journal of Technology Management 13(5/6): 471–485. Chubin, D.E. and Hackett, E.J. (1990) Peerless Science. Peer Review and U.S. Science Policy, Albany, NY: State University of New York Press. Cicchetti, D. (1991) ‘The reliability of peer review for manuscript and grant submission’, Behavioral and Brain Sciences 1(14): 119–186. Cohen, M.D., March, J.G. and Olsen, J.P. (1972) ‘A garbage can model of organizational choice’, Administrative Science Quarterly 17(1): 1–25. Cole, S. (1998) ‘How does peer review work and how can it be improved?’ Minerva 36(2): 179–189. Cole, S., Cole, J.R. and Simon, G.A. (1981) ‘Chance and consensus in peer review’, Science 214(4523): 881–886. Higher Education Policy 2012 25
Finn Hansson and Mette Mønsted Reviewing Science for Innovation and Value for Society
378 Ernø-Kjølhede, E. and Hansson, F. (2011) ‘Measuring research performance during a changing relationship between science and society’, Research Evaluation 20(2): 130–142. Etzkowitz, H. and Leydesdorff, L. (2000) ‘The dynamics of innovation: from national systems and ‘mode 2’ to a triple helix of university–industry–government relations’, Research Policy 29(2): 109–123. Geuna, A. and Martin, B. (2003) ‘University research evaluation and funding: an international comparison’, Minerva 41(4): 277–304. Gibbons, M., Limoges, C., Nowotny, H., Scott, S., Schwartzman, P. and Trow, M. (1994) The New Production of Knowledge. The Dynamics of Science and Research in Contemporary Societies, London: Sage Publications. Hansson, F., Brenneche, N.T., Mønsted, M. and Fransson, T. (2009) ‘Benchmarking successful models of cooperation’, SUCCESS Work Package 1. Karlsruhe, http://openarchive.cbs.dk/ handle/10398/6347. HEFCE. (2009) Research Excellence Framework, 2009. Second Consultation on the Assessment and Funding of Research, September 2009/38, London: HEFCE. HEFCE. (2012) REF 01.2012 Panel Criteria and Working Methods Contents, Main London: HEFCE. Hessels, L.K. and van Lente, H. (2008) ‘Re-thinking new knowledge production: a literature review and a research agenda’, Research Policy 37(4): 740–760. KIC InnoEnergy. (2010), http://eit.europa.eu/kics1/kic-innoenergy.htm l, accessed 1 February 2012. Kostoff, R.N. and Geisler, E. (2007) ‘The unintended consequences of metrics in technology evaluation’, Journal of Informetrics 1(2): 103–114. Kuhn, T.S. (1970) The Structure of Scientific Revolutions, Chicago, IL: University of Chicago Press. Lamont, M. (2009) How Professors Think. Inside the Curious World of Academic Judgement, Cambridge, MA: Harvard Business Press. Lamont, M. and Huutoniemi, K. (2011) ‘Comparing Customary Rules of Fairness: Evaluative Practices in Various Types of Peer Review Panels’, in C. Camic, N. Gross and M. Lamont (eds.) Social Knowledge in the Making, Chicago, IL: University of Chicago Press, pp. 209–232. Langfeldt, L. (2006) ‘The policy challenges of peer review: managing bias, conflict of interest and interdisciplinary assessment’, Research Evaluation 15(1): 31–41. Laudel, G. (2006) ‘Conclave in the Tower of Babel: how peers review interdisciplinary research proposals’, Research Evaluation 15(1): 57–68. Leifer, R., McDermott, C.M., O’Connor, G.C., Peters, L.S., Rice, M. and Veryzer, R.W. (2000) Radical Innovation: How Mature Companies Can Outsmart Upstarts, Boston, MA: Harvard Business School Press. Lindblom, C.E. (1959) ‘The science of “muddling through” ’, Public Administration Review 19(2): 79–88. Martin, B.R. (2003) ‘The Changing Social Contract for Science and the Evolution of the University’, in A. Geuna, A.J. Salter and W. Edward Steinmueller (eds.) Science and Innovation, Cheltenham: Edward Elgar, pp. 7–27. Merton, R. (1938) ‘Science and the social order’, Philosophy of Science 5(3): 321–337. Merton, R.K. (ed.) (1942) ‘The Normative Structure of Science’, in (1973) The Sociology of Science: Theoretical and Empirical Investigations, Chicago, IL: University of Chicago Press, pp. 267–280. Merton, R.K. (ed.) (1968a) ‘Science and Democratic Social Structure’, in Social Theory and Social Structure XVIII, New York: The Free Press, pp. 604–615. Merton, R. (1968b) ‘The Matthew effect in science’, Science 159(3810): 56–63. Nowotny, H., Gibbons, M. and Scott, P. (2001) Re-thinking Science. Knowledge and the Public in an Age of Uncertainty, Oxford: The Polity Press. Higher Education Policy 2012 25
Finn Hansson and Mette Mønsted Reviewing Science for Innovation and Value for Society
379 Nowotny, H., Scott, P. and Gibbons, M. (2003) ‘ “Mode 2” revisited: the new production of knowledge’, Minerva 41(3): 179–194. Olbrecht, M. and Bornmann, L. (2010) ‘Panel peer review of grant applications: what do we know from research in social Psychology on judgment and decision-making in groups?’ Research Evaluation 19(4): 293–304. Porter, T.M. (1992) ‘Quantification and the accounting ideal in science’, Social Studies of Science 22(4): 633–652. Seglen, P.O. (1997) ‘Citations and journal impact factors: questionable indicators of research quality’, Allergy 52(11): 1050–1056. Snow, C.P. (1993) The Two Cultures, Cambridge: Cambridge University Press. Stake, R.E. (2001) ‘Case Studies’, in N.K. Denzin and Y.S. Lincoln (eds.) Handbook of Qualitative Research, Vol. 3, London: Sage, pp. 435–454. Starbuck, W.H. (2005) ‘How much better are the most-prestigious journals? The statistics of academic publication’, Organization Science 16(2): 180–200. van den Besselaar, P. and Leydesdorff, L. (2009) ‘Past performance, peer review and project selection: a case study in the social and behavioral sciences’, Research Evaluation 18(4): 273–288. Ware, M. (2008) Peer Review in Scholarly Journals: Perspective of The Scholarly Community — An International Study, London: Publishing Research Consortium. Whitley, R. (2000) The Intellectual and Social Organization of the Sciences, Oxford: Oxford University Press. Whitley, R. (2010) ‘Changing Governance of the Public Sciences’, in R. Whitley and J. Gla¨ser (eds.) The Changing Governance of the Sciences,, Dordrecht: Springer, pp. 3–30.
Higher Education Policy 2012 25