Developing a real-world database for oncology: a descriptive analysis of breast cancer in Argentina

4 Aug 2022
Guillermo Streich, Marcelo Blanco Villalba, Christian Cid, Guillermo F Bramuglia

Introduction: Registries based on Real-World Data (RWD) are those obtained outside of systematised and randomised clinical trials. They allow the collection of information from a large number of patients and enable the participation of a significant number of professionals. PrecisaXperta is a web platform developed for this purpose with more than 2 years of operation, parameterised for oncology. Its design allows the construction of an epidemiological database in real time and exportable for processing.

Objective: To describe the characteristics and operation of this online data recording tool, explain how it was developed and analyse the quality of the information recorded, taking as an example the data obtained for breast cancer.

Materials and methods: Physicians, computer scientists and data science analysts participated in the development. Patient data, history, educational level, diagnosis, staging, molecular markers, quality of life, types of treatments, progression and response, imaging, complications, adverse events are some of the fields included. Data treatment in terms of encryption, anonymisation, protection and validation is also explained. The selected breast cancer data for description were processed with medium-level statistical programmes, since the number required to apply Big Data engines is not yet available.

Results: From a total of 6,892 solid tumours, 1,892 were breast cancer and 1,654 were selected that complied with a data set minimum elaborated ad hoc. Cases from 13 provinces showed a geolocation bias according to the place of practice of the professionals in the collaborative network. The predominant lack of data was detected in molecular markers (ki67) and correlativity in some lines of treatment. Inconsistencies in dates and therapeutic schemes were also detected. Data curation made it possible to exclude them. The age of the patients was 55.3 ± 11.88 years. At the time of diagnosis, the predominance was in stage I: 36.48% and II 30.06%, with positive hormone receptors in 1,424 (89.96%) cases. The predominant treatments were hormonal (61.54%) and target directed with 30.85% for HER2(+) and 39.14% for HER2(−) accompanied in most cases (85.9%) by some period of chemotherapy. Immunotherapy was much less represented (0.36%). Data were processed, homogenised, pooled and presented and made accessible in a form suitable for application to RWD analyses.

Conclusions: PrecisaXperta fulfils this purpose of systematising the information to facilitate its loading with its simple and intuitive interface. From the analysis of the data obtained in breast cancer, it is clear that some fields should be mandatory in order to improve the quality of the information. The results describing the registered breast cancers give us a surface view of the affected population and prepare us to design future studies when we have local Big Data. This type of development, with continuous improvements and online results, will allow with its dissemination, that the participating professionals have information of what happens in the real world, having available in a democratic way, the epidemiology to be able to study, publish and investigate with these data.

Related Articles

Francisco E Villanueva, Natalia S Jara, Valentina Darlic
Iris Otoya, Natalia Valdiviezo, Katia Roque, Zaida Morante, Tatiana Vidaurre, Silvia P. Neciosup, Mónica J. Calderón, Henry L. Gomez
Ahmed Balla M Ahmed, Salma Alrawa, Ahmed A Yeddi, Esraa S A Alfadul, Hind Mohi Aldin Abd Allah, Muhannad Bushra Masaad Ahmed³
Natalia Camejo, Cecilia Castillo, Diego Santana, Lucia Argenzio, Dahiana Amarillo, Guadalupe Herrera, Maria Guerrina, Gabriel Krygier