J Med Syst (2012) 36:1901–1907 DOI 10.1007/s10916-010-9649-y
ORIGINAL PAPER
A Pilot Study on Image Analysis Techniques for Extracting Early Uterine Cervix Cancer Cell Features Babak Sokouti & Siamak Haghipour & Ali Dastranj Tabrizi
Received: 30 September 2010 / Accepted: 28 December 2010 / Published online: 11 January 2011 # Springer Science+Business Media, LLC 2011
Abstract The second most common and preventable form of cancer among women worldwide is cervical cancer in which the signs for this disease can be detected in the early Pap smear screening of cervical cells. To improve the efficiency of expert diagnosis, we will need to automate the feature extraction of cervical cancer cells by the means of image processing techniques. This article employs image processing techniques to get the special features of normal, precancerous and cancerous cell images. We extract spectral features for cervical cancer cell detection. This article uses the noise decrease filters, OTSU threshold to make it ready for processing through 2-D Fourier and logarithmic transforms. By drawing the linear plot, we will be able to extract the feature of normal, precancerous and cancerous cells according to the texture and morphology automatically. These linear plots will be unique which can separate the cells in three groups of normal, precancerous and cancerous cells. This separation is done with 100% accuracy due to the unique linear plots. The experiment shows that extracted unique features for each cell will provide evidences for diagnoses even in cytopathology images in which the nucleus and cytoplasm segmentation algorithms suffer from complex overlaying cells. Keywords Cytopathological cell images . Pap smear . Cervical cancer . Medical image analysis B. Sokouti (*) : S. Haghipour Faculty of BioMedical Engineering, Islamic Azad University-Tabriz Branch, Tabriz, Iran e-mail:
[email protected] A. D. Tabrizi Department of Pathology, Tabriz University of Medical Sciences, Tabriz, Iran
Introduction Cervical cancer is a major issue in women’s health today, especially in developing countries. It takes years for an abnormal cell to grow, and we need the earliest possible sign of abnormality to let for the earliest treatment [1–3]. ThinPrep Pap smear screening can be useful as an early diagnostic process which can diagnose the cervical cancer [4, 5]. Finding abnormal cells in ThinPrep Pap smears is an error-prone and difficult problem for pathologists. Therefore, the need for an automated screening tool is desirable [6–8]. Most of the researches done on automatic cervical screening try to segment the nucleus and cytoplasm accurately to detect the abnormal cells. Even with 100% segmentation accuracy, the presence of blood, inflammatory cells or complex overlaying cells, cell shape features may fail to show the differences between normal and abnormal cells. The gray level co-occurrence matrix (GLSM) textural feature was used for nuclear segmentation which achieved 3.3% misclassification on as data set of 61 cells [9]. A geometric active contours was used for segmentation and an automatic circular decomposition method was used for connected cell segmentation [10]. Then moving k-means clustering and modified seed based region growing (MSBRG) algorithm was conducted for automating edge detection was unstable to noise [11]. A region-growing-based features extraction (RGBFE) for extracting the cervical cells features was proposed to predict the cell stage with accuracy of 97.5% [12, 13]. In this paper for overcoming the segmentation problems and difficulties, we propose a new approach based on known techniques to distinguish normal, precancerous (LSIL: lowgrade squamous intraepithelial lesions) and cancerous (HSIL: high-grade squamous intraepithelial lesions) cells. This takes advantages of spectral property and avoids
1902
segmentation difficulties. In this paper firstly, each cell image is cropped manually from a cytopathological ThinPrep image which is used as an input in the image processing algorithm block diagram. Then the output will be three unique linear plots which present different patterns in normal, precancerous and cancerous cells. This part will be discussed in the second section, in the third section the obtained results will be discussed and will be compared with other methods. At the end the conclusion is mentioned which will also include the future work on this method.
Materials and methods The images of uterine cervical cells are captured by a high resolution computer-controlled digital camera QImaging GO3 3 MB mounted on the microscope Olympus BX40 placed in ALZAHRA HOSPITAL, Tabriz, Iran. The uncompressed digitized images in size of 1024*768 are acquired and saved as JPEG or JPG image file format. The compression factor is not used as the quality of the images is important at the first stage. Then the pathologist crops a single specific cell from the image to extract its grey-scaled properties from the ThinPrep Pap smear screening images. 250 single cell images randomly obtained from 20 patients (84 normal, 83 LSIL, 83 HSIL) are captured and cropped. Thirty (30) features of the linear plots can be extracted from the finale images by using image processing system built with feature extraction algorithms to be used for the future work in clustering the cells by the means of artificial neural networks. A Cell includes a nucleus surrounded by cytoplasm. As a traditional way, a pathologist evaluates the cytoplasm and the background of slide. The abnormality features are described as Size: There is an increased size of the nucleus compared to the cytoplasm, Shape: Smooth, circular, oval outline belongs to normal nuclei, Texture: Rough textures belong to abnormal nuclei, Chromaticity: Abnormal nuclei are darker than normal ones. The image processing block diagram of ThinPrep images of cervical cells is shown in Fig. 1. In the proposed block diagram, we are supposed to find a way to demonstrate a plot that shows the nuclei and cytoplasm rate distribution. In medical images, two kinds of distribution of noise exist; the first type of noise is the noise distributes evenly in the whole image, and the second type is the noise mainly distributes at surrounding of the primitives. The median filter as a spatial filter, whose response is based on ranking the pixels contained in the image, is a noise reducer of the mentioned types of noises. It replaces the value of a pixel by the median of the grey levels in the neighborhood pixel. A 2-D median filter of 5*5 neighborhoods is applied to the input pixel so the output pixel will contain the median value
J Med Syst (2012) 36:1901–1907
of this process. In the third step, sharpening images includes subtracting a blurred version of an image from an image itself to distinguish between nuclei, cytoplasm and other ingredients which can be expressed as: fs ðx; yÞ ¼ f ðx; yÞ f ðx; yÞ
ð1Þ
In which fs(x,y) shows the sharpened image gained from unsharp masking and f ðx; yÞ shows a blurred version of f(x,y). It uses a Gaussian blur low pass filter with a standard deviation of 10 pixels with a total size of 15*15 filter and the scaling factor of 0.9. After applying the median and unsharp mask filters, we segment the images into nuclei and background (including the cytoplasm and the image background) by OTSU threshold. The medical images has the characteristics of no clear-cut set of histogram peaks corresponding to distinct phases or structures in the image. In these cases, direct thresholding of the image can be useful [16, 17]. For this purpose, the texture and chromaticity features are used to evaluate the normality and abnormality (precancerous and cancerous) of the nuclei as the texture and big dark nuclei on its own is a reliable indicator of nuclear normality and abnormality. Texture features can refer to frequencyspace relations and the chromaticity features caused by increased cellular density or abnormal intracellular keratins [14, 15]. Also these features can be outstanding in comparison to the outcome of image segmentation. In this step, we already used threshold to get the binary version of the image. Threshold is a useful mean for separating objects from the background. If we choose T as a global threshold parameter, the threshold function will be achieved by the below formula: ( 1 f ðx; yÞ T gðx; yÞ ¼ ð2Þ 0 Otherwise By applying the threshold to the image, we can represent the cytoplasm as white and nuclei as black. The threshold function calculations are done by Otsu’s method. Each M ×N cell image can be represented as a function f ðx; yÞ; x ¼ 0; ::M 1; y ¼ 0; ::; N 1. The 2-D Fourier Transform of f shown by F(u,v)is given according to the equation below: Fðu; vÞ ¼
M 1 X N 1 X
vy
f ðx; yÞej2pð M þ N Þ ux
ð3Þ
x¼0 y¼0
For u ¼ 0; ::M 1; v ¼ 0; ::; N 1, we can expand this with sines and cosines in the frequency domain which is determined by u and v. This Transform is used to convert the threshold images in to spatial frequency domain. We need to place the zero frequency unit at the frequency space
J Med Syst (2012) 36:1901–1907
1903
Fig. 1 Block diagram of with image analysis system for feature extraction of cell images
centre, we are able to shift the image before applying the transformation. The general form of log transform is illustrated as below which will be applied to the Fourier magnitude result as the whole result will be complex: s ¼ c logðr þ 1Þ
ð4Þ
In which c is a constant, equals 1 and we assumer≥0. A narrow range of grey level values in the input image will be mapped into wider range of output level through this transformation. This will expand the values of dark pixels in an image while compressing the higher level values which will reduce the DC value of all pixel values to present the details more than before. In this stage the properties of three types of cells will be detectable. A mean filter is a spatial filtering that reduces the noise at the presence of noise in current images. This filter will calculate the average value of corrupted image f(x, y) in an area defined by S xy (represent set of coordinates in a rectangular subimage window of size m×n, centred at point) showed by formula 5. As a result, a 5×5 mean filter will lessen the noise and by twice performing this filter the image will be smoothed. As by applying once, the noise reduction is not enough for automatic diagnosis. b gðx; yÞ ¼
1 X f ðs; tÞ mn ðs;tÞ2S
ð5Þ
xy
The two dimensional images are needed to be analyzed by one dimensional image processing techniques. By drawing the linear plot of the image, we extract the centre row values of the images because of the high intensity of the values and plot them against their positions for further consideration. The vertical axis shows the normalized value which is drawn against the horizontal axis that shows the related pixel position shown in Figs. 3, 4 and 5.
Results and discussion The proposed approach to cervical cancer detection in ThinPrep Pap smear images have been evaluated on a database containing 250 single cell images randomly obtained from 20 patients (84 normal, 83 LSIL, 83 HSIL). First, all images are preprocessed to remove the existing noises and normalize the intensity. Then for classifying each cell as normal, LSIL and HSIL, after applying the noise reduction filters and transformations to spatial frequency domain, the algorithm extracts the linear plot of the cell images which are unique regarding to their types. As it was observed, two goals were followed in the cell images, one of them was to reduce the noise in the images and the other one was to get the linear plot to review and analyze the method. To accomplish these two goals simultaneously, we took the advantage of image radial symmetry and the pixels according to their radius is made median. By considering all the pixels, firstly 5 pixels from the middle of the intermediate image is made median, then 6 pixels from the center of the middle of the image is done the same and along the radius of image this action is repeated. Then the results are reflected according to the existing DC. Noise reduction is done by averaging pixel square roots is averaged, so reducing noise changes is applied as a function of radius. The result images for each processing step are also presented in Fig. 2. As it is shown in Fig. 3, a local minimum at 7th pixel from the center of symmetry (red arrow) or the approximate frequency 55mm−1 can be seen before the symmetry point and a local maximum after the symmetry point at 16th pixel (green arrow) or in frequency 127mm−1. This shows that the normal cell features are repeated regularly, but this feature is not seen in abnormal cells. Similarly, according to the linear plots shown in Fig. 4, the ascending graph property without a minimum or
1904
J Med Syst (2012) 36:1901–1907
Fig. 2 Result images from each image processing algorithm steps
local maximum in the plot can be seen in the fully cancerous images. The linear plot shown in Fig. 5, shows a condition between the previous two cases meaning LSIL or pre-cancerous cell, unlike the normal cells. In this plot
Fig. 3 Linear plot of 5 normal cells
there is no local maximum, indicating that the cells have the ability of being HSIL risk and are therefore preventable. The extracted features in Figs. 3, 4 and 5 are related to 15 cells. The normal, LSIL, HSIL cells are included and
J Med Syst (2012) 36:1901–1907
1905
Fig. 4 Linear plot of 5 LSIL cells
indicated that each cell type whether it is a normal, LSIL or HSIL cell, it will have a unique frequency chart with 100% accuracy in comparison with other methods. This can specially be compared with nuclei and cytoplasm separation methods (with accuracy of 80%–95%), [9–13] the obtained result can be regarded as a very accurate method in cervical cancer diagnosis field. The accuracy of a measurement system or algorithm is the degree of closeness of the proposed diagnosis to the pathologist’s diagnosis. The reason of why it is been said the accuracy of this algorithm is 100% without any statistical plots is provided by the means of the linear plots structure to manually classify the cells based on the plot structures. Further the plots can be used as an input for an intelligent neural network system for automatic classification. The histopathological images shown in Fig. 6 is related to the presence of several cell images in which we run the image processing process and has reached in two important facts. The first one is the proposed algorithm expressed without the need to separate nuclei and cytoplasm is able to detect the presence of abnormal cells; and if there is a cell overlap, it can still be able to detect the presence of abnormal cells by representing the cell features in the form of linear plots.
We conclude that the use of the proposed algorithm incorporating feature selection improves the cell separation process without nuclei and cytoplasm segmentation, this method improves the accuracy of separating the cell groups. The fact that the proposed algorithm can also work on overlapping different cells including different types of features suggests that the information content of the dataset is not limited by noise anymore and a separation accuracy of 100% can be achievable.
Conclusion The separation of nuclei and cytoplasm by the means of image analysis is so difficult. So many researches have been done on Pap smear classification. These researches are done for getting features from frequency domain of the images. In this paper a novel method for analyzing the cervical cell images by using features obtained from images and graphs of linear spectrum in Fourier domain is offered. Thus, images can be classified in 3 distinct groups of normal, LSIL and HSIL. It is shown that features obtained from frequency analysis and Fourier
1906
Fig. 5 Linear plot of 5 HSIL cells
Fig. 6 Linear plot of 5 overlapping histopathological cells
J Med Syst (2012) 36:1901–1907
J Med Syst (2012) 36:1901–1907
transform, can be used to classify single cell images. Also, in cases where there are overlapping cells or nuclei and cytoplasm separation is difficult, we are still able to classify these kinds of cells by the proposed algorithm. As a matter of fact with working on the small amount of cell images, though we didn’t find any shortcomings, there will be a need to include other features obtained from SP-D immunostaining (brown color) in epithelial cell. For the future work, we are working on broader samples to use them for classifying the cells with artificial neural networks.
Conflicts of interests The authors had no competing interests to declare in relation to this article.
References 1. Othman, N. H., Cancer of the cervix—from bleak past to bright future. Pustaka Reka Publishing Company, Kelantan, Malaysia, 2003. 2. V. Linasmita, Cervical cancer screening in Thailand –FHI-satellite meeting on the prevention and early detection of cervical cancer in the Asia and the Pacific region, 2006. 3. Bazoon, M., Stacey, D. A., Chen, C., et al., A hierarchical artificial neural network system for the classification of cervical cells. IEEE World Congress on Computational Intelligence. IEEE International Conference on Neural Networks 3526:3525–3529, 1994. 4. Kemp, R. A., MacAulay, C., Garner, D., et al., Detection of malignancy associated changes in cervical cell nuclei using feedforward neural networks. Analytical Cellular Pathology 14:31–40, 1997. 5. Pernick, B. J., Kopp, R. E., Lisa, J., et al., Screening of cervical cytological samples using coherent optical processing. Part 1 (ET). Appl. Opt 17:21, 1978.
1907 6. Ricketts, I. W., Cervical cell image inspection—a task for artificial neural networks. Network: Computation in Neural Systems 3:15– 18, 1992. 7. Ricketts, I.W., Banda-Gamboa, H., Cairns A.Y., et al. Towards the automated prescreening of cervical smears.Applications of Image Processing in Mass Health Screening, IEE Colloquium on, 1992; 7/1–7/4 8. McKenna, S. J., Ricketts, I. W., Cairns, A. Y., et al. A comparison of neural network architectures for cervical cell classification. Third International Conference on Artificial Neural Networks. 105–109:1993. 9. Walker, R. F., Jackway, P., Lovell, B., et al. Classification of Cervical Cell Nuclei using Morphological Segmentation and Textural Feature Extraction.Second Australian and New Zealand Conference on Intelligent Information Systems. 297–301:1994. 10. Harandi, N. M., Sadri, S., Moghaddam, N. A., and Amirfattahi, R., An Automated Method for Segmentation of Epithelial Cervical Cells in Images of ThinPrep. Journal of Medical Systems. 34(6):1043–1058, 2010. 11. Mat-Isa, N. A., Automated edge detection technique for pap smear images using moving K-means clustering and modified seed based region growing algorithm. Int J Comput Internet Manage 13 (3):45–59, 2005. 12. Mat-Isa, N. A., Mashor, M. Y., and Othman, N. H., An automated cervical pre-cancerous diagnostic system. Artificial Intelligence in Medicine 42:1–11, 2008. 13. Mashor, M. Y., Hybrid training algorithm for RBF network. Int J Comput Internet Manage 8(2):50–65, 2000. 14. McKenna, S. J., Ricketts, I. W., Cairns, A. Y., et al. Cascadecorrelation neural networks for the classification of cervical cells. Neural Networks for Image Processing Applications, IEE Colloquium on. 5/1–5/4:1992. 15. Ricketts, I.W., Banda-Gamboa, H., Cairns, A.Y., et al. Automatic classification of cervical cells-using the frequency domain. Applications of Image Processing in Mass Health Screening, IEE Colloquium on. 9/1–9/4:1992. 16. Russ. J. C. The Image Processing Handbook Fifth Edition. 2009. 17. Gonzales, R.C., Woods E. R. Digital Image Processing. 2007.