Hum Genet (2002) 110 : 524–525 DOI 10.1007/s00439-002-0726-2
WEBSITE REVIEW
Oskar A. Haas
Felix Mitelman: Database of chromosome aberrations in cancer http://cgap.nci.nih.gov/Chromosomes/Mitelman
Received: 26 February 2002 / Published online: 10 April 2002 © Springer-Verlag 2002
Linus Torvald, the founder of the Linux computer operating system, once remarked, “Software is like sex. It is best when it is free.” What applies to software is, of course, also true for databases. What would we all do without all those far-seeing colleagues, who for decades not only took over the time-consuming task of painstakingly collecting data from the literature, but also in an unselfish manner made all this information finally available free of charge? For cancer cytogeneticists, the standard reference database was always the “Catalog of Chromosome Aberrations in Cancer” by Felix Mitelman and his co-workers, Bertil Johansson and Fredrik Mertens. Following five printed versions and a brief interlude on CD-ROM, the database was put on the web more than a year ago. It is continuously serviced and updated quarterly and comprises 39,847 cases up until November 2001. The compilation of such an amount of cytogenetic data from publications by Mitelman’s group over the years is an absolutely unbelievable undertaking that deserves our unrestricted admiration. Some of the best computer experts at the University of Lund, the NCI, and the NCBI have then spent considerable time and effort to develop the database further and make it as reliable, fast, and userfriendly as possible. Its main features are the almost unlimited possibilities to search for combinations of any of the following parameters: structural and numerical abnormalities, as well as breakpoints, sole anomalies, number of clones, morphology, topography, recurrent aberrations, and patient characteristics like age, sex, hereditary disorder, geographic location, and previous tumor and treatment. Links to the original publications (“Case Info”, “Ref. Info”), to PubMed and the major molecular genetic databases also provide the hub for further journeys into the (cyto)geneticist’s cyberspace.
O.A. Haas (✉) Children’s Cancer Research Institute (CCRI), St. Anna Children’s Hospital, Kinderspitalgasse 6, A-1090 Vienna, Austria e-mail:
[email protected]
The most important issues for the unacquainted user of any database are, however, how easy and reliable she or he can obtain the required information. In case of Mitelman’s database, the retrieval strategies are simple and straight forward. However, some background knowledge about how the data were generated and assembled and about some peculiarities of the ISCN nomenclature itself are definitely helpful to increase the success rate of particular search strings. In this context, one always needs to keep in mind that these seemingly exact data actually derive from the subjective interpretation of rearrangements and assignment of breakpoints that over time were also influenced and modified by the (subconscious) adaptation and inclusion of the more precise results of FISH and molecular genetic studies. Moreover, the published cytogenetic data were neither altered nor corrected by Mitelman’s group. It therefore also comes as no surprise that the database contains approximately 3% of bands that do not actually exist in the ISCN nomenclature. Let me provide you with a few examples for illustration of the above points. My first search for leukemias with a simple t(11;17) detected only two cases. Only the inclusion of the breakpoints or wild cards, t(11;17)(q23; *) or t(11;17)(*;*), revealed an appropriate number of cases (41 and 59, respectively), which reflects what one would expect. A more sophisticated search, namely “t(*;11)(*;q23), not t(4;11)(q21;q23) in acute lymphoblastic leukemias”, on the other hand, was immediately successful and obtained a plausible 72 cases. The search for a common and highly specific abnormality, the t(15;17)(q22;q12), provides some clues about the difficulties that derive from the “fuzziness” of karyotype descriptions as well as the encoding and retrieval mode of such cytogenetic data. This particular reciprocal translocation is exclusively associated with acute promyelocytic leukemia (AML-M3) and the gene involved on chromosome 17, RARA, has been precisely mapped to 17q12. On the other hand, the assignment of the respective breakpoints is difficult on the cytogenetic level alone and, therefore, inconsistent, at best either at 17q12, q21 or q12-q21. Depending on my search strategy, I came
525
across 1015 AML-M3 cases, 148 with t(15;17), 740 with t(15;17)(*;*), 181 with (q22;q12), 312 with (q22;q21), 649 with (q22;*) and 691 with (q22;*), but without additionally specifying AML-M3. One minor problem I came across in the database is the fact that unfortunately one cannot search for specific tretrasomies (e.g. for +8, +8). However, this shortcoming is related to the programming strategy chosen in order to make all searches as flexible, and above all, as quick as possible. Finally, to find out what the database can tell you about the cancer cytogeneticists themselves, I searched for cases with a t(12;21). Although this is the most common translocation in childhood acute lymphoblastic leukemia, it is virtually undetectable with conventional cytogenetic means. The cloning of the breakpoints in 1995 resulted in a boom in cases that were identified with RT-PCR and FISH screening. I came across 56 cases in the database. Not surprisingly, only two cases were reported before 1995, but already 11 in 1996 and 39 in the last two years, the latter mostly in the context of complex structural changes. This boom indicates to me that cancer cytogeneticists learn very fast and that, following the first description of such masked abnormalities, the precision of their analysis sud-
denly improves to an unbelievable degree. An alternative explanation is that cancer cytogeneticists are very flexible and have already crossed the Rubicon of conventional cytogenetics by adopting undeclared information obtained by other means. What will be the future of Mitelman’s cancer cytogenetic database? Do we need another couple of hundred t(9;22)s (which in any case are not included as single, CML-associated abnormalities) or t(15;17)s in the database? Probably not in their present form. However, there are still many interesting problems and questions that can only be resolved and answered with chromosome-based information. These include the evaluation of the significance of secondary abnormalities and, of course, the deciphering of complex abnormalities in lymphomas and solid tumors. However, cytogenetic data alone become increasingly meaningless without the supplementation and direct integration of FISH and molecular genetic data. Of course, there is no doubt that Mitelman’s database provides a unique foundation for such a purpose. It is the one, and only one, on which such an extended and more universal cancer database can be built up.