Association rules of Prognostic Variables for Survival in a Randomized Comparison of Treatments for Prostatic Cancer

In this paper, the data was analyzed by data mining techniques of association rules. The data for 506 patients consist of an identification number, stage of tumour, a code for the treatment to which the patient was assigned, the date of randomization, the total months of follow-up since randomization, an indicator for the survival status or cause of death, and the values of twelve pretreatment covariates. The goal of an analysis should be to compare the treatments with respect to survival of the patients. Since this was a randomized study it would ordinarily not be necessary to adjust for the values of the pretreatment covariates. However, in such studies it is advisable to examine the prognostic significance of the covariates and to confirm that they are balanced across treatment groups. In addition, the analyst should look for important treatment-covariates interactions which might lead to the definition of subsets of patients in which treatment differences were significantly more marked or even reversed. Abstrak  Dalam artikel ini, data dianalisis dengan teknik data mining aturas asosiasi. Data untuk 506 pasien terdiri dari nomor identifikasi, stadium tumor, kode untuk perawatan yang diberikan kepada psien, tanggal pengacakan, total bukan followup sejak pengacakan, indicator untuk status bertahan hodup atau penyebab kematian, dan nilai-nilai dari dua belas pretreatment covarites. Tujuan dari analisis ini adalah membandingkan treatments dengan kelangsungan hidup pasien, dikarenakan ini adalah penelitian acak, maka tidak menyesuaikan nilai covariates pretreatment. Namun pada studi tersebut disarankan untuk memeriksa sinifikansi dari covariates prognostic dan mengkonfirmasi bahwa hasilnya seimbang diantara treatment groups. Selain itu analisis ini mencari interaksi treatment-covariates yang penting yang dapat mengarah pada definisi himpunan bagian pasien dimana perbedaan pengobatan secara siginifikan lebih ditandai atau baha terbalik. Kata Kunci  data mining, prognostic, cancer, randomized clinical trial.


INTRODUCTION
A prognosis is the doctor's best estimate of how cancer will affect someone.Many factors can affect a person's prognosis.In general, survival is used to estimate the percentage of people with cancer who will live at least a certain amount of time (such as 1, 3, 5, or 10 years) after their diagnosis [1], [2].The independent prognostic factors affecting survival were assessed in 240 men undergoing treatment for metastatic prostate cancer as part of a randomized clinical trial comparing the gonadtropi releasing hormone analogue Zoladex (goserelin acetate implant) with castration [3].
In multivariate analysis, the most highly significant predictors were the presence or absence of bone pain, serum testosterone levels, serum alkaline phosphate levels, and performance status.Patients with all forms favorable for survival had a 2-year survival rate of 84% as compared with only 8% for patients with none of the few factors favorable for survival, no other factors were significant [3].

II.
RESEARCH METHODOLOGY The material was data from Andrews and Herzberg (1985) [4].The data was the comparison of treatment for prostatic cancer.The layout of data was as follows in Table 1.
The variables involved are as follows: Pat.No. is patient number.Mos FU is complete months of follow-up.SBP is systolic blood pressure.DBP is diastolic blood pressure.EKG is electrocardiogram.HG is serum haemoglobin.SZ is size of primary tumour.SG is combined index of tumour stage and histologic grade.AP is serum prostatic acid phosphatase in King-Amstrong units.BM is bone metastases.Furthermore, the next page told the variables also.
The mining process was conducted by rapidminer version 5.2 software [5].The aim of applying Association Rules was to detect relationships or associations between specific values of categorical variables in the large data sets [6], [7].This technique allows analysts and researchers to uncover hidden patterns in large data sets.From more than 405960 association rules generated by the data and later the minimum criterion imposed on 0.8 for its support and confidence then come out 221 results as can be seen on Figure 3  CONCLUSION The output of association rules with minimum criterion 0.8 resulted in 221 rules with 10 association rules could be considered as the most probable accepted association between premises and its conclusion due to its support and its confidence was the biggest one of 95.5% each.However, from the ten biggest score of support and confidence, for every determined premises the conclusion was always contained age, weight index and the number of complete months of follow up of the patients.

Figure. 1
Figure. 1 Taxonomy of Data Mining Tasks

Figure. 2
Figure. 2 Association Rules architecture for prognostic variables data set where the highest support and highest confidence lied on cases 212 to 221.If the premises was each (Stage), (Rx), (Pat no), (Date on-study), (AP) then the conclusion were Age,yrs, Wt, Mos FU.It meant that if the premises each was the stage of prostatic cancer, the Rx (the treatment), Patient Number, Date on-study, and AP (Serum Prostatic Acid) then the conclusion were age of the patient, Weight index of the patients, and the number of complete months of follow-up.It meant that its premises and its conclusion created association rules which its support and confidence of 0.955 or 95.5% each.It also if the premises was (Stage, Rx) then the conclusion was Age,yrs, Wt, Mos FU.It meant that the Stage and the treatment of the cancer were associated with the age, weight index, and the number of complete months of follow up with its support and confidence of .955or 95.5%.If the premises was Stage only, then the conclusion was (Pat no, Age,yrs, Wt, and Mos FU).It meant that the stage was associated with Patient Number, Age of patients, weight index of the patients and the number of complete months of follow-up.If the premises were Patient Number then the conclusion was Stage, and age, weight index, and the number of complete months of follow-up IV.