Day 3 :
Keynote Forum
Ajit Kumar Roy
Central Agricultural University-Tripura, India
Keynote: Impact of big data analytics on business, economy, health care and society
Time : 09:00-09:30
Biography:
Ajit Kumar Roy obtained his MSc degree in Statistics and joined Agricultural Research Service (ARS) of Indian Council of Agricultural Research (ICAR) as a Scientist in 1976. He has edited eighteen books and several conference proceedings covering the areas of statistics, bioinformatics, economics, and analytics applications in aquaculture/fisheries/agriculture and allied fields besides published over 100 articles in refereed journals & conference proceedings. He is a highly acclaimed researcher and consultant. His recent popular books are ‘Applied Big Data Analytics’; ‘Self Learning of Bioinformatics Online’; ‘Applied Bioinformatics, Statistics and Economics in Fisheries Research’ and ‘Applied Computational Biology and Statistics in Biotechnology and Bioinformatics’. He is widely recognized as an expert research scientist, teacher, author, hands-on leader in advanced analytics. He served as National Consultant (Impact Assessment), Consultant (Statistics), Computer Specialist and Principal Scientist at various organizations at national and international levels. Presently he is a visiting Professor of four Indian Universities.
Abstract:
Big data exists in a wide variety of data-intensive areas such as atmospheric science, genome research, astronomical studies and network traffic monitor. Huge data is created every day by the interactions of billions of people using computers, GPS devices, cell phones, censors and medical devices. Due to the tremendous amount of data generated daily from business, research and sciences, big data is everywhere and represents huge opportunities to those who can use it effectively. In the past, this information was simply ignored and opportunities were missed. In the big data era, realizing the great importance of big data, many analytical organizations are moving beyond process improvements to find hidden information buried in big data and trying to make the best use of it. The growing technological ability to collect and analyze massive sets of information, known as Big Data, could lead to revolutionary changes in business, political and social enterprises, according to a new survey of internet experts. Till date a lot of work is done on big data covering the areas of tools, software, platforms, analytics etc. Presently many companies are successfully using these for benefits. National and International organisations entering the areas of application of big data analytics for development, education, disaster management, health care, natural resource management etc. for benefit of society. Therefore, it is attempted to compile and document the real use cases, benefits, advantages, impact and future challenges of big data. To evaluate the effectiveness of harnessing big data for development, UN Global Pulse has worked on several research projects in collaboration with public and private partners demonstrates how big data analytics can be beneficial to the work of policy makers in different contexts from monitoring early indicators of unemployment hikes to tracking fluctuations of commodity prices before they are recorded in official statistics. According to thought leaders Big data is already showing that potential in areas as far ranging as genetic mapping and personalized e-commerce and big data backed by the exponential growth in processing power and software technologies such as Hadoop, are allowing organizations “to make decisions that simply could not be made before, to handle all sorts of data questions.”And that will have resounding impact. Big data will have an impact on all industries and every process. Its influence will be felt in business planning, research, sales, production and elsewhere and these amounts to nothing less than new industrial revolution. The advances in capturing and analyzing big data allow us to decode human DNA in minutes, find cures for cancer, accurately predict human behaviour, foil terrorist attacks, pinpoint marketing efforts, prevent diseases and so much more. Finally, increasing concerns about privacy, as many have been expressed about how retailers, credit card companies, search engine providers and mail or social media companies use our private information. The presentation is focused around the real-life implementation of Big Data Analytics and discusses and describes impact in details that will provide bold vision from leading innovators across the data-driven spectrum and help gain fresh insights.rn
- Track 2: Statistical and Computing Methods; Track 5: Bioinformatics; Track 6: Computer Science and Systems Biology
Location: Texas B
Chair
Francisco Louzada
University of Sao Paulo
Brazil
Co-Chair
Abdel-Salam Gomaa Abdel-Salam
Qatar University
Qatar
Session Introduction
Joel Michalek
University of Texas
USA
Title: p53-based strategy to reduce hematological toxicity of chemotherapy: A pilot study
Time : 11:25-11:45
Biography:
Joel E Michalek completed his PhD from Wayne State University. He has a broad background in biostatistics pertaining to theory and methods, preclinical and clinical trials, and epidemiology. He has written protocols and grants, analyzed data, and co-authored manuscripts arising from clinical studies in surgery, emergency medicine, cancer, and pediatrics and was formerly Principal Investigator of the Air Force Health Study, a 20-year prospective epidemiological study of veterans who sprayed Agent Orange and other herbicides in Vietnam. He has authored 180 journal articles and two book chapters.
Abstract:
p53 activation is the primary mechanism underlying pathological responses to DNA-damaging agents such as chemotherapy and radiotherapy. Study objectives were to: (1) define the lowest safe dose of arsenic trioxide that blocks p53 activation in patients and (2) assess the potential of LDA to decrease hematological toxicity from chemotherapy. Patients scheduled to receive a minimum of 4 cycles of myelosuppressive chemotherapy were eligible. For objective 1, dose escalation of LDA started at 0.005 mg/kg/day for 3 days. This dose satisfied objective 1 and was administered before chemotherapy cycles 2, 4 and 6 for objective 2. CBC was compared between the cycles with and without LDA pretreatment. p53 level in peripheral lymphocytes was measured on day 1 of each cycle by ELISA essay. Subjects received arsenic at cycles 2, 4 and 6 and no arsenic at cycles 1, 3, and 5. Of a total of 30 evaluable patients, 26 were treated with 3-week cycle regimens and form the base of our analyses. The mean white blood cell, hemoglobin and absolute neutrophil counts were significantly higher in the suppressed group relative to the activated group. These data support the proof of concept that suppression of p53 could lead to protection of normal tissue and bone marrow in patients receiving chemotherapy.
Hsin-Hsiung Huang
University of Central Florida
USA
Title: The out-of-place testing for genome comparison
Time : 11:45-12:05
Biography:
Hsin-Hsiung Huang has completed his PhD from the University of Illinois at Chicago and has been a One-Year Visiting Assistant Professor of the Department of Statistics at the University of Central Florida. He is now a Tenure-Track Assistant Professor of the Department of Statistics at the University of Central Florida. He has published 5 papers in reputed journals.
Abstract:
The out-of-place distance measure with alignment-free n-gram based method has been successfully applied to automatic text or natural languages categorization in real time. Like k-mers, n-grams are sub-sequences of length ‘n’ from a given sequence, but the ways of computing the number of n-grams and k-mers are different. Additionally, it is not clear about its performance and the selection of ‘n’ for comparing genome sequences. In this study, the author proposed a symmetric version of the out-of-place measure, a non-parametric out-of-place measure, and an approach for finding the optimal range of ‘n’ to construct a phylogenetic tree with the symmetric out-of-place measures. This approach, then, is applied to four mitochondrial genome sequence data-sets. The resulting phylogenetic trees are similar to the standard biological classification. It shows that the proposed method is a very powerful tool for phylogenetic analysis in terms of both classification accuracy and computation efficiency.
Dongmei Li
University of Rochester
USA
Title: An evaluation of statistical methods for DNA methylation microarray data analysis
Time : 12:05-12:25
Biography:
Dongmei Li completed her PhD in Biostatistics from Department of Statistics at The Ohio State University. She is currently an interim Associate Professor in the Department of Clinical & Translational Research at the University of Rochester School of Medicine and Dentistry. She has published more than 25 methodology and collaborative papers in reputed journals and been served as Co-Investigator or Statistician on multiple federal funded grants and contracts.
Abstract:
DNA methylation offers a process for elucidating how epigenetic information affects gene expression. β values and M values are commonly used to quantify DNA methylation. Statistical methods applicable to DNA methylation data analysis span a number of approaches such as Wilcoxon rank sum test, t-test, Kolmogorov-Smirnov test, permutation test, empirical Bayes method, and bump hunting method. Selection of an optimal statistical method, however, can be challenging when different methods generate inconsistent results for the same data set. We compared six statistical approaches relevant to DNA methylation microarray analysis in terms of false discovery rate control, statistical power, and stability through simulation studies and a real data example. Our results provide guidance for optimal statistical methods selection under different scenarios. For DNA methylation data analysis purposes, similar results are obtained using either β or M values in terms of false discovery rate control, power, and stability. The empirical Bayes method is recommended for DNA methylation studies with small sample size. All methods are acceptable for medium or large sample sizes. The bump hunting method has much lower stability than the other methods when the true proportion of differentially methylated loci is large, a caveat to be considered when choosing the bump hunting method.
Abdel-Salam Gomaa Abdel-Salam
Qatar University
Qatar
Title: Outlier robust non-linear mixed model estimation on gene expression
Time : 12:25-12:45
Biography:
Abdel-Salam Gomaa Abdel-Salam holds BS and MS (2004) degrees in Statistics from Cairo University and MS (2006) and PhD (2009) degrees in Statistics from Virginia Polytechnic Institute and State University (Virginia Tech, USA). Prior to joining Qatar University as an Assistant Professor and a Coordinator of the Statistical Consulting Unit, he taught at Faculty of Economics and Political Science (Cairo University), Virginia Tech, and Oklahoma State University. Also, he worked at J P Morgan Chase Co. as Assistant Vice President in Mortgage Banking and Business Banking Risk Management Sectors. He has published several research papers and delivered numerous talks and workshops. He was awarded a couple of the highest prestige awards such as Teaching Excellence from Virginia Tech, Academic Excellence Award, Freud International Award, and Mary G Natrella Scholarship from American Statistical Association (ASA) and American Society for Quality (ASQ), for outstanding Graduate study of the theory and application of Quality Control, Quality Assurance, Quality Improvement, and Total Quality Management. He is a Member of the ASQ and ASA. Also, he was awarded the Start-Up Grant Award from Qatar University (2014/15) and The Cairo University Award for international publication in 2014. His research interests include all aspects of industrial statistics and economic capital models, including statistical process control, multivariate analysis, regression analysis, exploratory and robust data analysis, mixed models, non-parametric and semi-parametric profile monitoring, health-related monitoring and prospective public health surveillance.
Abstract:
In standard analyses of data well-modeled by a Non-Linear Mixed Model (NLMM), an aberrant observation, either within a cluster, or an entire cluster itself, can greatly distort parameter estimates and subsequent standard errors. Consequently, inferences about the parameters are misleading. This paper proposes an outlier robust method based on linearization to estimate fixed effects parameters and variance components in the NLMM. An example is given using the 4-parameter logistic model and gene expression and environment datasets and comparing the robust parameter estimates to the non-robust estimates.
Francisco Louzada
University of Sao Paulo
Brazil
Title: A Unified Multivariate Survival Model in Presence of a Cure Fraction
Time : 12:45-13:05
Biography:
Francisco Louzada has completed his PhD from University of Oxford. He is the director of Technology Transfer and Executive Director of External Relations of the Center for Research, Innovation and Dissemination of Mathematical Sciences in Industry (CEPID-CeMEAI), in Brazil. He has published more than 190 papers in reputed journals and has been serving as Editor-in-Chief of the Brazilian Journal of Probability and Statistics (BJPS).
Abstract:
In this talk I present a new lifetime model for multivariate survival data with a surviving fraction. The model is developed under the presence of m types of latent competing risks and proportion of survival individuals. Inference is based on maximum likelihood approach. A simulation study is performed in order to analyze the coverage probabilities of confidence intervals based on the asymptotic distribution of the maximum likelihood estimates. The proposed modeling is illustrated through a real dataset on medical area.
Xing Li
Mayo Clinic
USA
Title: Workshop on Rcircle: A gene-oriented R package for integrating multiple ‘-omics’ data to prioritize disease related hub genes and network
Time : 13:40-14:40
Biography:
Xing Li has completed his PhD in Bioinformatics from The University of Michigan at Ann Arbor. He is an Assistant Professor in Division of Biomedical Statistics and Informatics, Department of Health Science Research at Mayo Clinic which has been recognized as the best hospital for 2014-2015 by US News & World Report. He has published more than 18 papers in reputed journals and has been serving as Peer Reviewers of over a dozen of journals, such as Genomics, BMC Bioinformatics, Stem Cell Research, PLOS ONE, Physiological Genomics, etc.
Abstract:
Biomedical Science has entered the big data era and biologists today have access to an overwhelming abundance of data due to the rapid advancement of high-throughput technology in sequencing and microarray during last decade. The tremendous volume, high dimensions and different types of data pose an unprecedented challenge on data visualization and integration for efficient data exploration and effective scientific communication. Herein, we developed an extendable gene-oriented R package to integrate and visualize interactome (especially for gene interaction networks with large number of nodes), time-course transcriptome data (especially for transcriptional data with more than three time points up to dozens of stages), disease related genetic factors, and disease affected pathways/networks to facilitate the gene prioritization and knowledge discovery. This gene-oriented R package is powerful for visualizing and integrating multiple ‘-omics’ data to prioritize actionable genes and facilitate biomedical knowledge discovery. Package is freely available on R website. One paper with the application of Rcircle package in human Dilated Cardiomyopathy study is featured as cover story on the Journal of Human Molecular Genetics and another paper is recommended as featured article by the Editor of Physiological Genomics. In this workshop, I will demonstrate the usage of this package in prioritizing disease-related genes in Congenital Heart Defects by integrating time-course transcriptome data, interactome data (gene interaction information), disease information, and involved pathway function groups, etc. In addition, the package is flexible to integrate other types of ‘–omics’ data as well, including whole genome sequencing, exome-seq, etc.
Fulvia Pennoni
University of Milano-Bicocca
Italy
Title: Casual analysis of the relation between epigenetic pathways and air pollution based on the joint use of mixed latent Markov models and the propensity score method
Time : 14:40-15:00
Biography:
Fulvia Pennoni has completed her PhD in Statistics from Florence University and her Postdoctoral studies from Joint Research Centre of the European Commission. After being a Researcher, now, she is an Associate Professor of Statistics at the Department of Statistics and Quantitative Methods of University of Milano-Bicocca. She recently published two books and several articles in the main international statistical journals. She has been serving as Referee in journals on the field of mathematical and statistical models.
Abstract:
We propose a novel model for longitudinal studies based on random effects to capture unobserved heterogeneity. We aim at extending the latent Markov Rasch model, which is specially tailored to deal with confounders and missing data on the primary response. The model is based on both time-fixed and time-varying latent variables having a discrete distribution. In particular, time-varying latent variables are assumed to follow a Markov chain. The model estimation is performed by the maximum likelihood procedure through the EM algorithm. This estimation is based on a set of weights associated to each subject to balance the composition of different sub-samples corresponding to different treatments/exposures in a perspective of causal inference. These weights are computed by the propensity score method. The model is applied to the analysis of epidemiological and molecular data from the Normative Aging Study (NAS), a longitudinal cohort of older individuals to identify key epigenetic pathways in humans that reflect air pollution exposure and predict worse cognitive decline. The participants are assigned estimates of black carbon exposure, a measure of diesel particles, since 2010; have epigenome-wide Illumina Infinium 450K Methylation BeadChip data for methylation at ~486,000 DNA sites measured at two different time points; and are administered cognitive testing assessing multiple functional domains every 3-5 years. We will consider DNA methylation as a possible intermediate variable mediating the effects of air pollution on cognitive aging. Epigenetic profiles may represent cumulative biomarkers of air pollution exposures and aid in the early diagnosis and prevention of air pollution-related diseases.
Ken Williams
KenAnCo Biostatistics
University of Texas
USA
Title: A meta-analysis of meta-analyses comparing LDL cholesterol, non-HDL cholesterol, and apo-lipoprotein B as markers of vascular risk
Time : 15:00-15:20
Biography:
Ken Williams received a BS in Applied Math from Georgia Tech in 1971 and an MS in Operations Research from the Air Force Institute of Technology in 1980. He served in the US Air Force for 22 years in Computer Systems and Scientific Analysis. He also served 10 years as a Biostatistician at the University of Texas Health Science Center at San Antonio where he remains an Adjunct Faculty Member. He has been a Freelance Biostatistician with KenAnCo Biostatistics since 2007. Designated a Professional Statistician (PStat) in the inaugural 2011 litter, he has published more than 100 papers in peer-reviewed journals.
Abstract:
This talk will combine and compare two meta-analyses. One included all the published epidemiological studies that contained estimates of the relative risks of LDL-C, non-HDL-C, and apoB predicting fatal or non-fatal ischemic cardiovascular events. Twelve independent reports, including 233,455 subjects and 22,950 events, were analyzed. Standardized relative risk ratios and confidence intervals were LDL-C: 1.25 (1.18, 1.33), non-HDL-C: 1.34 (1.24, 1.44) and apoB: 1.43 (1.35, 1.51), 5.7%>non-HDL-C (p<0.001) and 12.0%>LDL-C (p<0.001). The other meta-analysis included 7 placebo-controlled statin trials in which LDL-C, non-HDL-C, and apoB values were available. Mean CHD risk reduction (95% CI) per standard deviation decrease in each marker across these 7 trials were 20.1% (15.6%, 24.3%) for LDL-C; 20.0% (15.2%, 24.7%) for non-HDL-C; and 24.4% (19.2%, 29.2%) for apoB, 21.6% (12.0%, 31.2%)>LDL-C (p<0.001) and 24.3% (22.4%, 26.2%)>non-HDL-C (p<0.001). The inverse of treatment HRs from the trial meta-analysis were similar to the risk ratios from the observational meta-analysis indicating that parameters from both kinds of studies may be useful for projecting the number of events which can be avoided under different preventive treatment strategies.
Abdelsalam G Abdelsalam
Qatar University
Qatar
Title: Workshop on Advanced statistical analysis using SPSS
Time : 15:20-16:20
Biography:
Abdelsalam G Abdelsalam holds BS and MS (2004) degrees in Statistics from Cairo University and MS (2006) and PhD (2009) degrees in Statistics from Virginia Polytechnic Institute and State University (Virginia Tech, USA). Prior to joining Qatar University as an Assistant Professor and a Coordinator of the Statistical Consulting Unit, he taught at Faculty of Economics and Political Science (Cairo University), Virginia Tech, and Oklahoma State University. Also, he worked at J P Morgan Chase Co. as Assistant Vice President in Mortgage Banking and Business Banking Risk Management Sectors. He has published several research papers and delivered numerous talks and workshops. He was awarded a couple of the highest prestige awards such as Teaching Excellence from Virginia Tech, Academic Excellence Award, Freud International Award, and Mary G Natrella Scholarship from American Statistical Association (ASA) and American Society for Quality (ASQ), for outstanding Graduate study of the theory and application of Quality Control, Quality Assurance, Quality Improvement, and Total Quality Management. He is a Member of the ASQ and ASA. Also, he was awarded the Start-Up Grant Award from Qatar University (2014/15) and The Cairo University Award for international publication in 2014. His research interests include all aspects of industrial statistics and economic capital models, including statistical process control, multivariate analysis, regression analysis, exploratory and robust data analysis, mixed models, non-parametric and semi-parametric profile monitoring, health-related monitoring and prospective public health surveillance.
Abstract:
This workshop is designed to provide advanced and common statistical techniques in various disciplines which are very important for researches and practitioners using the SPSS. The workshop is designed especially to demonstrate some techniques and applications of biostatistics in medical, public health, epidemiology, pharmaceutical and biomedical fields. The following topics will be dealt with in the workshop: • Introduction to the Inferential Statistics and SPSS, • Perform hypothesis testing and confidence interval analysis for one and two populations, • One and two-way factorial ANOVA with SPSS, • Post Hoc multiple comparisons, • Conducted a Pearson correlation and created a scatter plot for the results, • Perform linear regression analysis (simple and multiple) and logistic regression, and • How to interpret the SPSS outputs and get the results into the report.
Adel Aloraini
Al Baha University
Saudi Arabia
Title: A directed acyclic graphical approach with ensemble feature selection for a better drug development strategy using partial knowledge from KEGG signaling pathways
Time : 16:35-16:55
Biography:
Adel Aloraini received Master’s degree from Informatics School at Bradford University, UK in 2007. Then, he received his PhD degree from Computer Science Department at the University of York, UK in 2011. In 2013, he was appointed as the Head of Computer Science Department and later was nominated as the Dean of Computer Science and Information Technology College- Al Baha University. He has been the Head of Machine Learning and Bioinformatics Group at Qassim University since 2012 and recently has been allocated as Associate Member at York Centre for Complex Systems Analysis (YCCSA), York University, UK in 2015. He also has been involved in several program committees worldwide and is being a Reviewer in different journals.
Abstract:
Genes and proteins along KEGG signaling pathways are grouped according to different criteria such as gene duplication and its co-operativity. Gene duplication is a process by which a chromosome or a segment of DNA is duplicated resulting in an additional copy of the gene which undergoes mutations, thereby creating a new different functional gene that shares important characteristics with the original gene. Similar sequences of DNA building blocks (nucleotides) are an example of such duplication. Gene co-operativity is another criterion for grouping genes and proteins together into one family in KEGG signaling pathways that could be manifested at the protein-protein interaction level. At large, KEGG signaling pathways show a high level knowledge of the structural interaction between genes but with some limitations. In this presentation, we will discuss our recent approach for revealing more details about inter-family interactions in KEGG signaling pathways, using gene expression profiles from Affymetrix microarrays which in turn show a promising avenue for a better drug development strategy. Learning from gene expression profiles is, however, problematic given that the number of genes usually exceeds the number of samples, known as the p>>n problem. Learning from such high-dimensional space needs solvers to consider the most relevant genes that best explain the cellular system being studied. Hence, in this presentation we will show how we tackled this problem through developing a machine learning of graphical model approach that utilizes novel ensemble feature selection methods to justify the choice of the most relevant features, genes in this case.
Ajit Kumar Roy
Central Agricultural University Tripura
India
Title: Web resources for bioinformatics, biotechnology and life sciences research
Time : 16:55-17:15
Biography:
Ajit Kumar Roy obtained his MSc degree in Statistics and joined Agricultural Research Service (ARS) of Indian Council of Agricultural Research (ICAR) as a Scientist in 1976. He has edited eighteen books and several conference proceedings covering the areas of statistics, bioinformatics, economics, and ICT applications in aquaculture/fisheries/agriculture and allied fields besides published over 100 articles in refereed journals & conference proceedings. He is a highly acclaimed researcher and consultant. His recent popular books are ‘Applied Big Data Analytics’; ‘Self Learning of Bioinformatics Online’; ‘Applied Bioinformatics, Statistics and Economics in Fisheries Research’ and ‘Applied Computational Biology and Statistics in Biotechnology and Bioinformatics’. He is widely recognized as an expert research scientist, teacher, author, hands-on leader in advanced analytics. He served as National Consultant (Impact Assessment), Consultant (Statistics), Computer Specialist and Principal Scientist at various organizations at National and International levels. Presently he is a visiting Professor of four Indian Universities.
Abstract:
The vast amount of information generated has made computational analysis critical and has increased demand for skilled bioinformaticians. There are thousands of bioinformatics and genomics resources that are free and publicly accessible. However, trying to find the right resource and to learn how to use the complex features and functions can be difficult. In this communication, I explore ways that you can quickly find and effectively learn how to use resources. It will include a tour of examples, resources, organized by categories such as Algorithms and Analysis tools, expression resources, genome browsers, Literature and text mining resources. One can learn how to find resources with the OpenHelix free search interface. OpenHelix searches hundreds of genomics resources, tutorial suites, and other material to deliver the most relevant resources in seconds. Documented the Web-based tools and technologies, resources, web content, blog posts, videos, webinars, and web sites that will facilitate easy access and use that saves time and effort avoiding massive generalized searches or hunting and picking through lists of databases. The purpose of the documentation is to bridge the gap between the rising information needs of biological and medical researchers and the rapidly growing number of online bioinformatics resources. The freely available, searchable databases arranged by categories and sub-categories such as Databases and Analysis Tools, Proteomics Resources, and Enzymes and Pathways. Key programming tools and technologies used in bioinformatics and molecular evolutionary research are provided. Those interested in learning basic biocomputing skills will find links to selected online tutorials.
Nazanin Nooraee
Eindhoven University of Technology
The Netherlands
Title: Strategies for handling missing outcomes in longitudinal questionnaire surveys
Time : 17:15-17:35
Biography:
Nazanin Nooraee has completed her PhD in 2015 from University of Groningen, the Netherlands. Her main interest is in applied statistics with longitudinal analysis orientation. Currently, she is a Postdoctoral Research Fellow at Eindhoven University of Technology, the Netherlands.
Abstract:
Missing data is a pervasive issue in data collection, particularly in questionnaire data. Questionnaire is a tool for obtaining data from individuals, and commonly a (predefined) function of these data (i.e., sum scale or mean scale) is considered to be analyzed. Having, even one unanswered item (question) leads to a missing score. Not tackling this issue may result in biased parameter estimates and misleading conclusions. Although, numerous methods have been developed for dealing with missing data, comparing their performance on questionnaire data has received less attention. In the current study, the performance of different missing data methods were investigated via simulation. We used maximum likelihood and multiple imputation approaches either at item level or at scale level. Furthermore, we implemented a hybrid approach that uses the advantages of both aforementioned methods. Parameter estimates were examined in terms of bias and Mean Square Error (MSE) relative to an analysis of the full data set.
Mohammed Imran
University of Dammam
Saudi Arabia
Title: Bio-stastistics with computational intelligence
Biography:
Mohammed Imran has completed his PhD from Jamia Millia Islamia (A Central University), New Delhi, India. He is presently working as Assistant Professor in Department of Computer Science, University of Dammam. He has published more than 25 papers in reputed journals and conferences and has been serving as an Editorial Board Member of repute.
Abstract:
In the world of imprecise, partial, and vague information, the crisp logic or is no more found to be producing good results, especially in medical science images. On the other hand, the fuzzy logic defines multiple values of output for different imprecise, partial and vague information in images as input. In such cases, the decision making with crisp logic does not define any fuzzy valid solutions, also called as f-valid solutions. Certainly, in most of the cases fuzzy valid solutions plays an important role, which cannot be neglected. Therefore, the fuzzy valid solution for multiple parameters is calculated using Ordered Weighted Averaging.
Mohammad Imran
King Faisal University
Saudi Arabia
Title: Machine learning approaches for biomedical and biometric data
Biography:
Mohammad Imran received PhD in Computer Science in 2012. The title of his thesis was ‘Some Issues Concerning Biometric system’ under the guidance of Professor G Hemantha Kumar. During 2012-13, he had been a Post-Doctorate Fellow under TRC (The Research Council, Oman) sponsored program. Currently, he is working as an Assistance Professor in King Faisal University, Saudi Arabia, Prior to this, he was working as an Associate Consultant at WIPRO Technolgoies, Bangalore. His areas of research interests include machine learning, pattern recognition, computer vision, biometrics, image processing, predictive analysis, algorithms, data structure, linear algebra. He authored 25 international publications which include journals and peer-reviewed conferences.
Abstract:
The aim of this talk is to apply a particular category of machine learning and pattern recognition algorithms, namely the supervised and unsupervised methods, to both biomedical and biometric images/data. This presentation specifically focuses on supervised learning methods. Both methodological and practical aspects are described in this presentation. The presentation is in two parts. In the first part, I will introduce data preparation concepts involved in preprocessing. After a quick overview, I will give an overview dimensions /features, curse of dimensionality, understanding data impurities like missing data and outliers, data transformations, scaling, estimation, normalization, smoothening, etc. In the second part of the presentation, I will discuss issues and challenges specific to 1. Supervised and unsupervised learning, 2. Statistical learning theory, 3. Errors and noise, 4. Bias and variance tradeoff, 5. Theory of generalization, 6. Training vs. testing, and 7. Over-fitting vs. under-fitting. A whole summarization of learning concept and its applications will be done. During the course of the presentation, I will attempt to survey some results on biometric and biomedical data. Finally, future challenges will be discussed.
Biography:
Amal Khalifa is currently working as an Assistant Professor of Computer Science at College of Computer & Information Sciences, Princess Norah University in KSA. In addition, she is the Vice-Head of the Computer Science Department and a Member in the accreditation committee of the program. She graduated in 2000 from the Scientific Computing Department at Faculty of Computers & Information Science, Ain Shams University, Egypt. She worked as a Teaching Assistant and earned her MSc in the field of Information Hiding in Images in 2005. In 2006, she was granted a scholarship in a joint supervision program between the Egyptian Ministry of Higher Education and University of Connecticut-USA to get the PhD in the area of high performance computing in 2009. Her main research interests are: steganography and watermarking applications, high performance computing, cloud computing, computational biology and security.
Abstract:
People are always looking for secure methods to protect valuable information against unauthorized access or use. That's why disciplines like cryptography and steganography are gaining a great interest among researchers. Although the origin of steganography goes back to the ancient Greeks, recent Information Hiding techniques embed data into various digital media such as sound, images, and videos. However, in this research we took a step further to utilize DNA sequences as a carrier of secret information. Two techniques will be presented. The first method hides the secret message into some reference DNA sequence using a generic substitution technique that is followed by a self-embedding phase such that the extraction process can be done blindly using only the secret key. The second technique is called the Least Significant Base Substitution or LSBase for short. This method exploits a remarkable property of codon redundancy to introduce silent mutations into DNA sequences in such a way that the carrier DNA sequence can be altered without affecting neither the type nor the structure of protein it produces.
- Track 1: Medical and Clinical Biostatistics; Track 3: Bayesian Statistics; Track 4: Regression Analysis
Session Introduction
Aiguo Li
National Institutes of Health
USA
Title: Glioma classification and translational application in clinics
Time : 10:50-11:10
Biography:
Aiguo Li is a Senior Bioinformatician in Neuro-Oncology Branch, National Cancer Institute, National Institutes of Health. Her research interests include glioma tumor biomarker identification and prognostic studies, and tumor molecular and functional classification. The goals of her research are to understand the underlying molecular mechanisms of glioma tumorigenesis and its progression; and developing novel therapeutic approaches for curing glioma patients. In the past, she has established a glioma molecular classification model containing six sub-types and further translated this model into a clinically useful tool – GliomaPredict, which allows clinicians and researchers to assign new patients to the existing sub-types. The focus of her research is, currently, on understanding the glioma tumor’s heterogeneity and infiltration mechanism by integrative analysis of multi-dimensional high throughput data of glioma patients and glioma stem cell lines.
Abstract:
Gliomas are the most common type of primary brain tumors in adults and a significant cause of cancer-related mortality. Defining glioma sub-types based on objective genetic and molecular signatures may allow for a more rational, patient specific approach to therapy in the future. Applying two unsupervised machine-learning methods to 159 glioma patient gene expression profiles ranging from low to high grade gliomas, we have established a glioma classification model containing six distinct sub-types. These sub-types are validated using three additional data-sets and annotated for underlying molecular functions. To translate this glioma classification model into clinical application, we developed a web-based tool, GliomaPredict, for assigning new patients into our molecular sub-types. The classification model also facilitates us to study glioma progression mechanism in cohered and to design targeted clinical trials.
Huaping Wang
University of Illinois at Chicago
USA
Title: Using different methods to analyze the pre-post no-matching survey data in medical practical research
Time : 11:10-11:30
Biography:
Huaping Wang has completed her PhD in Statistics from University of Alabama in 2007. She is a Research Assistant Professor of the University Of Illinois College Of Medicine at Peoria Division of Research Services. She has worked with various projects from many different medical fields, such as pediatrics, medicine, emergency medicine, neurology, neurosurgery, surgery, OB/GYN, and radiology. She has many years of experience providing statistical consulting to the UIC faculty, residents, MD students, and local medical research investigators.
Abstract:
Some problems arise when analyzing pre and post survey studies using anonymous questionnaire. The pre and post data is related, but it is impossible to match the pre and post data of the same responders. Teen drivers have higher crash risk when they drive after consuming alcohol. The American Red Cross Central Illinois Chapter conducts “Operation Prom Night” crash re-enactment programs with local high school students and the Illinois Department of Transportation. The goal is to teach students about the danger of drinking and driving. The survey objective is to measure changes in knowledge and behavior related with drinking and driving. In this situation, some students answered both pre and post survey, some students only answered pre survey or post survey. Pre and post data could not be paired for the same student. The data used was from 13 schools between 2007 and 2011. We used independent sample statistical models to analyze the data in two different ways. One chose all data and another one chose part of the data but independently. In order to create independent samples, we randomly chose 6 out of the 13 schools as group1, and others as group2. The pre or post data was used from group1. By contrast, the post or pre data was used from group2. Both methods decreased the power. The first one was because of extra variability introduced by not knowing the individual, and the second one was because of sample size reduced. All results were compared and had the similar conclusions.
Ahmed A Bahnassy
King Fahad Medical City College of Medicine
Saudi Arabia
Title: Teaching biostatistics to undergraduate Saudi medical students
Time : 10:00-10:30
Biography:
Ahmed A Bahnassy has completed his PhD from Tulane University. He is a Professor of Biostatistics and Medical Research in College of Medicine, King Fahad Medical City, Saudi Arabia. He has published more than 130 papers in reputed journals and has been serving as a Referee of 6 repute journals.
Abstract:
Background: The knowledge and ability to use biostatistical techniques have become increasingly important in management of data and interpretation of results of different of studies related to health sciences. Understanding biostatistical methods may improve clinical thinking, decision making, evaluations, medical research and evidence based healthcare. Unfortunately, there is an increasing gap between medical students and mathematical notations usually part-and-parcel of medical and biostatistics. Lack of connection between medical curricula and introductory courses in statistics has negative attitude among medical students. This study is aimed at evaluating the effect of a new teaching method (which is based on a mix of theoretical concepts and its applications using computer facilities) in teaching biostatistics to undergraduate medical students in a Saudi Faculty of Medicine, and to measure the students’ ability to understand the results of the statistical tests of the computer output and interpret them in meaningful texts. Methods: A new method of teaching biostatistics based on teaching theoretical concepts and application of the lectures using a real clinical dataset using PC-SPSS software. The results of this new teaching method were compared to the conventional method (based on lectures and scientific calculations) in two classes in two academic years. Results: Students in the two classes were 114 students; each class had 57 students. The new method’s class scored significantly higher than those who were taught using the traditional method in the following topics: measures of variability, confidence intervals testing hypothesis, t-test, Analysis of Variance (ANOVA), multiple linear regression and data presentations. Writing and interpreting the results showed borderline statistical difference between the two methods. Mean satisfaction scores for the students toward biostatistics were significantly higher for the new method than the traditional one (p<0.05). Conclusion: Teaching biostatistics for medical students should avoid calculations. Using computer software is recommended for analysis of real medical data.
Ram Shanmugam
Texas State University, USA
Title: Biostatistics can help to comprehend the shortage, illegal trade, and unmet demand of organ or tissue transplant
Time : 13:50-14:10
Biography:
Ram Shanmugam is the Editor-in-Chief for the journals: Epidemiology & Community Medicine, Advances in Life Sciences and Health, and Global Journal of Research and Review. He is the Associate Editor of the International Journal of Research in Medical Sciences. He is the Book-Review Editor of the Journal of Statistical Computation and Simulation. He directed Statistics Consulting Center in the Mississippi State University. In 2015, he has published a textbook with the title “Statistics for Engineers and Scientists”. He served the Argonne National Lab., University of Colorado, University of South Alabama and the Indian Statistical Institute. He has published 125 research articles and is a Fellow of the International Statistical Institute. Currently, he is a Professor in the School of Health Administration, Texas State University. He is a recipient of several research awards from the Texas State University.
Abstract:
Public, in general, and healthcare professionals, in particular, are confused from unclear and conflicting information in the organ transplant related data. To sort out and clarify such confusions, a statistical methodology is constructed and demonstrated in this article. The gap between the number of organ donors and the number waiting for organ transplant is named shortage. The gap between the number of organ donors and the number of recipients is named illegal organ trade level. The gap between the number of organ recipients and the number waiting for organ transplant is named unmet organ demand. Expressions are derived, based on a statistical methodology, to compute the confidence interval for these true unknown gaps. A few recommendations are compiled and stated in the end to close such gaps for the sake of those waiting for organ transplant to have a quality life.
Walid Sharabati
Purdue University
USA
Title: A regression approach for noise removal in image analysis
Time : 14:10-14:30
Biography:
Walid Sharabati joined the Department of Statistics at Purdue University in the fall of 2008. He earned his PhD in Computational Statistics from George Mason University and has an MS in Mathematics and Computer Science from Minnesota State University. His main research interests are social networks, preferential attachment, text mining, stochastic processes, statistical modeling, Gaussian mixture models, and statistical models for image de-blurring and de-noising. He has been serving as an Editorial Board Member of Austin Statistics and Enliven: Biostatistics and Metrics.
Abstract:
Recording an image that is sharp and clear is sometimes challenging, and perturbations are inevitable. Brain CAT scans, for example, may contain noisy regions, and ultrasound images may have unclear object. Galactic images may be blurred and noised because the light is bent by the time it reached the camera or because the existence of dust in the outer space. The objective of image de-noising is to reduce the noise generated during the process of capturing an image, which is due to several factors such as bad camera sensor, memory location or noise in the transmission channel. In this talk, I present a novel approach to tackle Gaussian noise introduced to an image at different levels using a multiple regression model in which the neighboring pixels are the predictors used to estimate the pixel value (color) on the grid. To this end, I consider balls of varying radius values around the predicted pixel. The underlying algorithm portrays a typical inverse problem that requires the introduction of a regularization term to the system. Finally, I utilize the structural similarity index (SSIM) for images and peak signal to noise ratio (PSNR) measure to assess the performance of the model at different noise levels and radii. The results are promising and produce high similarity measure between the de-noised and original sharp images.
Ali Seifi
University of Texas Health Science Center
USA
Title: Healthcare cost and utilization project (H-CUP), a research tool
Time : 14:30-14:50
Biography:
Ali Seifi,MD ,FACP is an assistant professor in the Department of Neurosurgery at the University of Texas Health Science Center San Antonio who has served as an attending physician since 2012. He oversees the care of patients in University Hospitals state-of-the-art Neuro Intensive Care Unit (NICU), as well as leads the units daily operations. This NICU is the only intensive care unit in the region fully dedicated to the care of neurologically ill patients and staffed with physicians and nurses specially trained in the care of patients with brain, spine, and nervous system diseases.
Abstract:
HCUP includes the LARGEST collection of multi-year hospital care (inpatient, outpatient, and emergency department) data in the United States, with all-payer, encounter-level information beginning in 1988. HCUP is the Nation’s most comprehensive source of hospital data, including information on in-patient care, ambulatory care, and emergency department visits. HCUP enables researchers, insurers, policymakers and others to study health care delivery and patient outcomes over time, and at the national, regional, State, and community levels. The Healthcare Cost and Utilization Project (HCUP, pronounced "H-Cup") is a family of databases and related software tools and products developed through a Federal-State-Industry partnership and sponsored by AHRQ. HCUP databases are derived from administrative data and contain encounter-level, clinical and nonclinical information including all-listed diagnoses and procedures, discharge status, patient demographics, and charges for all patients, regardless of payer (e.g., Medicare, Medicaid, private insurance, uninsured), beginning in 1988. These databases enable research on a broad range of health policy issues, including cost and quality of health services, medical practice patterns, access to health care programs, and outcomes of treatments at the national, State, and local market levels. The HCUP databases are based on the data collection efforts of organizations in participating States that maintain statewide data systems and are Partners with AHRQ. In this presentation, we will discuss how to make a clinical research project based on this large database.
Biography:
Yousri Slaoui has completed his PhD from the University of Versailles and Postdoctoral studies from the National Scientific Research Centre of Statistics. He is an Associate Professor at the University of Poitiers. He has published more than 12 papers in reputed journals.
Abstract:
In this talk, we propose an approach, based on the stochastic expectation maximization (SEM) algorithm and Gibbs sampling, to deal with the problem caused by censoring in the response of a hierarchical random intercept models. As an application, we consider a dataset consisting of 2941 parasite density measurement gathered over a population of 505 Senegal children between 2001 and 2003. Assuming that all these measurement are corrects, we simulate the effect of various censure levels by removing the corresponding entries before performing our algorithm. The model residuals are then compared to those obtained with the full data. Even when 10%, 20% or even 30% of the original measurements are missing, the produced residuals remain very accurate thus demontrating the effectiveness of our approach. Moreover, we compared our approach with the existing methods via real data sets as well as simulations. Results showed that our approach outperformed other approaches in terms of estimation accuracy and computing efficiency.
Abbas F Jawad
University of Pennsylvania
Children’s Hospital of Philadelphia
USA
Title: Cluster randomization trials in schools setting
Biography:
Abbas F Jawad has earned his MSc (1986) and PhD (1993) from the Graduate School of Public Health, University of Pittsburgh, USA. He is an Associate Professor of Biostatistics in Pediatrics at the University of Pennsylvania, Perelman School of Medicine, and Department of Pediatrics at the Children’s Hospital of Philadelphia. He has published more than 100 papers in reputed journals and has been providing biostatistical support for medical pediatric research for more than 20 years.
Abstract:
Cluster Randomization Trials in School Setting (CRTSS) are increasingly utilized in studying new and/or existing behavioral and drug therapies targeting students within in school setting. The units of the analyses are the students attending schools but the randomization to the interventions often done at the school levels. Many of such studies are conducted with a small number of schools which circumvents meaningful comparison at the school level. Designing, conducting, and analyzing CRTSS require accounting to sources of variations related to implementations of the interventions proposed, school’s seasons and years, and the therapists employing the interventions. Other source of variations is related to between cluster (school) variation and within school correlations. Power estimations and sample size calculations for CRTSS require special attention to the nesting design of such trials. We will discuss and present recent studies utilized the CRTSS such as a cluster randomized trial to evaluate external support for the implementation of positive behavioral interventions and supports by school personnel, The Preventing Relational Aggression in Schools Everyday Program and Family-School Intervention for Children with ADHD.
Alfred Inselberg
Tel Aviv University
Israel
Title: Tutorial on Visualization and data mining for high dimensional datasets
Biography:
Alfred Inselberg received a PhD in Mathematics and Physics from the University of Illinois (Champaign-Urbana) and was Research Professor there until 1966. He held research positions at IBM, where he developed a Mathematical Model of Ear (TIME Nov. 74), concurrently having joint appointments at UCLA, USC and later at the Technion and Ben Gurion University. Since 1995, he is Professor at the School of Mathematical Sciences at Tel Aviv University. He was elected Senior Fellow at the San Diego Supercomputing Center in 1996, Distinguished Visiting Professor at Korea University in 2008 and Distinguished Visiting Professor at National University of Singapore in 2011. Alfred invented and developed the multidimensional system of parallel coordinates for which he received numerous awards and patents (on air traffic control, collision-avoidance, computer vision, data mining). The textbook ‘Parallel Coordinates: VISUAL Multidimensional Geometry and its Applications’, Springer (October) 2009, has a full chapter on data mining.
Abstract:
A dataset with M items has 2M subsets anyone of which may be the one fulfilling our objectives. With a good data display and interactivity our fantastic pattern-recognition can not only cut great swaths searching through this combinatorial explosion, but also extract insights from the visual patterns. These are the core reasons for data visualization. With parallel coordinates (abbr. k-cs) the search for relations in multivariate datasets is transformed into a 2-D pattern recognition problem. The foundations are developed interlaced with applications. Guidelines and strategies for knowledge discovery are illustrated on several real datasets (financial, process control, credit-score, intrusion-detection etc.), one with hundreds of variables. A geometric classification algorithm is presented and applied to complex datasets. It has low computational complexity providing the classification rule explicitly and visually. The minimal set of variables required to state the rule (features) is found and ordered by their predictive value. Multivariate relations can be modelled as hyper-surfaces and used for decision support. A model of a (real) country’s economy reveals sensitivies, impact of constraints, trade-offs and economic sectors unknowingly competing for the same resources. An overview of the methodology provides foundational understanding; learning the patterns corresponding to various multivariate relations. These patterns are robust in the presence of errors and that is good news for the applications. We stand at the threshold of breaching the gridlock of multidimensional visualization. The parallel coordinates methodology has been applied to collision avoidance and conflict resolution algorithms for air traffic control (3 USA patents), computer vision (1 USA patent), data mining (1 USA patent), optimization, decision support and elsewhere.
Altaf H Khan
King Abdullah International Medical Research Center
National Guard Health Affairs
Saudi Arabia
Title: Workshop on Choice of agreement in medical imaging: Inter-rater, intra-rater reliability index or receiver operating curve or any other index to rely on
Time : 11:30-13:00
Biography:
Altaf H Khan has completed three Master’s degrees in Biostatistics (2004), Applied Mathematics (1999) and Mechanical Engineering (2003) from the University of Utah. Currently, he is working as a Senior Biostatistician at King Abdullah International Medical Research (National Guard Health Affairs), Riyadh (Saudi Arabia) and prior to that he worked at the University of Utah Hospital and Prince Sultan Cardiac Center. He has many publications in international journals and proceedings.
Abstract:
As advancement in medical imaging has revolutionized the healthcare industry and now burden rests on the shoulders of the biomedical researchers and clinicians to make fruitful inferences from these bulk of the data. The very fundamental question and the crux of the issue is how radiologists as observers could agree on some reliable indices such as intra or inter-rater agreement, Receiver-Operating Curve (ROC) or any other reliable index which would pave the way for making clinical decision in the prognosis of an illness, because FDA (Federal Drug Administration) also strongly requires in clinical trial studies to establish as well as support the efficacy of medical imaging agent. In this paper, an attempt has been made to discuss existing reliability indices such as Cohen kappa, weighted kappa, and kappa used in triage system as well as review has been done for other existing indices. Also, ROC has been discussed with or without any gold standard medical imaging modalities, since there are no unanimous regulations from the manufacturing industries. Using PROC IML, macros has been written to compute reliability index used in triage system which is based upon taking into account the severity of mis-triage in which the reliability index has been calculated by applying an alternative weighting method. Computed reliability indices such as simple kappa, weighted kappa, as well as triage kappa have been compared with those indices that are available in standard statistical software packages namely SAS, R, Stata, etc. Reliability indices based on Bayesian approach have also been discussed.
Al Omari Mohammed Ahmed
Al Baha University
Saudi Arabia
Title: Markov Chain Monte Carlo estimation for Bayesian approach based on type-I censored data
Time : 14:50-15:10
Biography:
Al Omari Mohammed Ahmed has completed his PhD at the age of 31 years from Putra University of Malaysia. He is the head of department of Mathematics in faculty of art and sciences in AlBaha University. He has published more than 12 papers in reputed journals and he interesting in Bayesian statistics and survival analysis study.
Abstract:
This study consider the estimation of Maximum Likelihood Estimator and the Bayesian Estimator using Jeffrey’s prior and Extension of Jeffrey’s prior information of the Weibull distribution with type-I censored data. The shape parameter estimation by maximum likelihood method has been seen that are not available in closed forms, although they can be solved them by numerical methods. Moreover, the Bayesian estimates of the parameters, the survival and the hazard functions we can’t solve it analytical for that Markov Chain Mote Carlo is used, where the full conditional distribution for the scale and shape parameters are obtained via Gibbs sampling and Metropolis-Hastings algorithm following by estimated the survival and hazard functions. The methods are compared to Bayesian using Lindley’s approximation and maximum likelihood counterparts and the comparisons are made with respect to the Mean Square Error (MSE) and absolute bias to determine the best estimating of the parameters, the survival and the hazard functions.
Yanhong Gao
Chinese PLA General Hospital
China
Title: Establishment and verification of miRNAs expression molecules for breast cancer and its distant metastasis
Time : 15:10-15:30
Biography:
Yanhong Gao has completed her PhD from Chinese Academy of Medical Sciences in 2005. She work as an associate chief physician and associate professor at Department of Clinical Biochemistry of Chinese PLA General Hospital since 2005. She is interested in finding tumor new biomarkers for diagnoses and prognostic from blood by advanced technology methods(including Biometrics & Biostatistics). She has published more than 25 papers in journals.
Abstract:
Breast cancer is one of malignant tumors in women. Distant metastasis is the main death cause for breast cancer patients. The early detection of laboratory is the key for prevention and therapy breast cancer. MicroRNAs are a large class of single-strand endogenous non-coding small RNA. It was reported that cells may release tumor endogenous microRNAs into peripheral blood cycle and become circulating microRNAs in tumor genesis and development. We have analysed serum microRNAs of breast cancer and breast cancer metastasis patients by new screening strategy and biostatistics method. We have got hsa-miR-6090 and hsa-miR-451a as candidated molecules of breast cancer and breast cancer metastasis. Therefore, we speculate that hsa-miR-6090, hsa-miR-451a play an important role in the development and metastasis of breast cancer, and further study to verify their biological function will be go on.
Brent Spruill
Walden University
USA
Title: Increasing obesity rates among adolescents in the State of Massachusetts
Biography:
Brent Spruill is an sssistant director player personnel at Colorado Crush. He works with all students in Anatomy and Physiology and Statistics. He is responsible for scouting of all professional football teams all over the country in the National Football League (NFL) and Canadian Football League (CFL) for possible candidates for his organization.
Abstract:
Increasing obesity rates among adolescents in the State of Massachusetts are of concern to public-health professionals. High bullying rates may contribute to obesity. Guided by Maslow's safety component and Bandura's social-cognitive theory, this study investigated a relationship between hours spent television watching, bullying, and meeting physical-activity guidelines among Massachusetts adolescents. The association between the dependent variable (physical inactivity) and the independent variables (hours spent watching television and bullying) was explored using data from the 2009 Massachusetts Youth Risk Behavior Survey. Participants were 2,601 Massachusetts adolescents aged 13 to 18. Statistical analysis included chi-square, the Kruskal-Wallis Test, Mann-Whitney U, and Spearman correlation. Results revealed a significant negative correlation between television watching and physical activity, suggesting that the more hours students spent watching television, the less active they tended to be. The Kruskal-Wallis test showed a significant difference in hours of television watching by level of physical activity. To determine where the statistical differences lay, 3 pair-wise Mann-Whitney U tests were conducted; 2 were shown to be statistically significant. Physical activity and bullying were significantly associated. The results of the Mann-Whitney U test were significant, indicating that levels of activity for students who were not bullied were higher than those for students who were bullied. The social-change potential of this study is a better understanding of the relationship between bullying and physical inactivity among public health professionals in an increased effort to remove barriers to physical inactivity, help limit bullying, and increase health and welfare of adolescents.
Ken Williams
KenAnCo Biostatistics
University of Texas
USA
Title: A workshop on how to do meta-analysis right
Time : 15:45-16:45
Biography:
Ken Williams received a BS in Applied Math from Georgia Tech in 1971 and an MS in Operations Research from the Air Force Institute of Technology in 1980. He served in the US Air Force for 22 years in Computer Systems and Scientific Analysis. He also served 10 years as a Biostatistician at the University of Texas Health Science Center at San Antonio where he remains an Adjunct Faculty Member. He has been a Freelance Biostatistician with KenAnCo Biostatistics since 2007. Designated as a Professional Statistician (PStat) in the inaugural 2011 litter, he has published more than 100 papers in peer-reviewed journals.
Abstract:
This workshop will provide a brief overview of important features of meta-analysis. Topics will include: choosing between a fixed effect and a random effects model; accounting for correlations between statistics being compared; assessing the potential for bias; conducting subgroup analyses; doing meta-regression analysis; comparing the advantages and disadvantages of using published statistics versus individual-level data; doing Bayesian meta-analysis; choosing among available meta-analysis software; and applying parameters estimated by meta-analysis to support public health policy decisions. Examples will be provided from three published meta-analyses. One included all the 12 published reports from epidemiological studies that contained estimates of the relative risks of LDL-C, non-HDL-C, and apoB predicting fatal or nonfatal ischemic cardiovascular events. Another meta-analysis included 7 placebo-controlled statin trials in which LDL-C, non-HDL-C, and apoB values were available. The workshop leader was the lead analyst for these first two sample meta-analyses. The third sample meta-analysis was conducted by the Emerging Risk Factors Collaboration using individual-level epidemiological data from 3 studies which had published the relevant statistics and 23 which had not. All these sample meta-analyses were published in various journals. The workshop will wrap up with a discussion of how irreconcilable conclusions may be derived from different meta-analyses ostensibly pursuing the same objective.
Martial Longla
University of Mississippi
USA
Title: An objective Bayesian estimation of parameters in a log-binomial model
Biography:
Martial Longla received several diplomas at the Peoples' Friendship University of Russia: Teacher of Russian as Foreign Language, Interpreter/Translator with 3 languages (French, Russian, and English) with honors, Bachelor of Sciences and Master of Sciences in Mathematics, and several University and city awards for his leadership in the fight for students rights and promotion of African culture. He completed a PhD program in Moscow on Optimal Control Problems in Infinite Dimensional Spaces in 2008 and moved to the University of Cincinnati where he obtained a PhD in Mathematics in 2013. He joined the Department of Mathematics at the University of Mississippi in August 2013.
Abstract:
The log-binomial model is commonly recommended for modeling prevalence ratio just as logistic regression is used to model log odds-ratio. However, for the log-binomial model, the parameter space turns out to be restricted causing difficulties for the maximum likelihood estimation in terms of convergence of numerical algorithms and calculation of standard errors. The Bayesian approach is a natural choice for modeling log-binomial model, as it involves neither maximization nor large sample approximation. We consider two objective or non-informative priors for the parameters in a log-binomial model: an improper prior, and a proper prior. We give sufficient conditions for the posterior from the improper at prior to be proper, and compare the two priors in terms of the resulting posterior summaries. We use Markov Chain Monte Carlo via slice sampling to simulate from the posterior distributions. An overview of recent contributions to this problem will be provided. We will also present questions involving dependence.
Biography:
Alfred Inselberg received a PhD in Mathematics and Physics from the University of Illinois (UICU). He was Graduate Assistant at the Biological Computer Lab (BCL), where research on Brain Function, Cognition and Learning was carried out (coupled to McCulloch’s Lab at MIT of neural networks fame), and continued at BCL as Research Assistant Professor. During 1966-1995, he was IBM Researcher (reaching a rank just below Fellow) at the Los Angeles Scientific Center and later Yorktown Labs. He developed a Mathematical Model of the (Inner) Ear (TIME, Newsweek 1974, etc.) concurrently teaching at UCLA and USC. He joined the Technion’s faculty 1971-73, Ben Gurion University 1977-83, and is at Tel Aviv University since 1995. He was elected Senior Fellow in Visualization at the San Diego Supercomputing Center (1996), Distinguished Visiting Professor at the Korea University (2008) and National University of Singapore (2011). He invented the multidimensional visualization methodology of Parallel Coordinates which has become widely accepted and applied (air traffic control, data mining, etc.). His textbook is on the subject, published by Springer.
Abstract:
I want to be stunned by a visualization discovery … a WOW moment! And I do not mean that some variable values turned out to be much different than expected or the location of an event was different than expected, etc. But rather that something far-reaching we had no idea existed was found, something like … penicillin! This should be the measure of visualization’s success. And just how do we do that? For one thing luck helps but, as I tell my students, “When you work harder your luck … improves”! For a data-set with M items there are 2M possible subsets anyone of which may turn out to be the one satisfying our objectives. With our fantastic pattern-recognition ability we can cut great swaths through this combinatorial explosion discovering patterns corresponding to relational information from a good data display. This is something that simply cannot be automated … thank goodness! Patterns are geometrical creatures and so we need to learn geometry. Actually from our point-of-view we are not interested in rigid patterns but malleable ones e.g. “gaps” which can be different in shape but are gaps, nonetheless. That is we are really in the topology of the patterns. It has been shown that multidimensional patterns cannot be discovered directly from their points. Rather they can be synthesized from lower dimensional information. Even in 3-D, we learn to look at planes not by their points but by their planar surface/shape consisting of their lines and ditto for surfaces. We need to discuss and adopt a rigorous syllabus for the discipline of Visualization involving geometry, topology and cognition among others. This is our best investment for the future. Research on the geometry and topology induced by ||-coords has made great strides. Many patterns corresponding to multivariate relations have been discovered. We have embarked on a project to transform these results into powerful tools for our exploration and data mining arsenal. They revolutionize the power of modern ||-coords.
Muhammad Salman Bashir
King Fahad Medical City
Saudi Arabia
Title: Biostatistics and clinical research: Psychological burden and low level of knowledge among medical practitioners at King Fahad Medical City, Riyadh
Biography:
Muhammad Salman Bashir has an MSc in Statistics and is a Certified Clinical Research Associate from Canadian Association of Clinical Research. He is working in King Fahad Medical City in the capacity of a Biostatistician Specialist I at the Research Center and has a great seven years of experience of Hospitals and Pharmaceutical products and their local clinical trials. He has more than six publications in local and international journal.
Abstract:
Background: Physicians, particularly those with no formal education in epidemiology and biostatistics, had a poor understanding of common statistical tests and limited ability to interpret study results. Fundamental concept of biostatistics and epidemiology are awful for physicians. If physicians do not understand fully the primary concept of biostatistics and epidemiology, then conclusions reach will be more likely to be wrong. Objective: To evaluate the low level knowledge and awareness of basic and advanced biostatistics and epidemiology among physicians, residents, clinicians and researchers at King Fahad Medical City. Methodology & Design: The cross sectional descriptive study design was used. The survey was completed among 250 participants in this study. Target sample was enumerated of all physicians, clinicians, residents, researchers and interns; both male and female; from different departments who were practicing and worked in their OPD, emergency, clinics and other faculties. Result: The initial pilot survey was completed only 250 participants from 8 departments and 3 faculties. The overall mean percentage corrected answer score based on statistical knowledge and biostatistics of results was 31.8% [95% C.I, 28.6% - 38.2%] in contrast 65.6% [95% C.I, 58.3% - 72.1%] for research fellows and general medicine faculty with research training which is highly statistically significant at (p<0.001). High scores in resident were associated with additional advanced degrees 48.3% [95% C.I, 45.6 – 55.8%] in comparison with 42.5% [95% C.I, 38.3% - 44.6%] at (p<0.001). Conclusion: A large number of medical practitioners had low level knowledge and concept of biostatistics and unable to interpret basic and advanced statistical concept that commonly found in the medical literature. Formalized teaching system of biostatistics and epidemiology will be required during the residency for better understanding and proficient in statistical information.
Edwin M M Ortega
Universidade de São Paulo
Brazil
Title: A power-series beta Weibull regression model for predicting breast carcinoma
Biography:
Edwin M M Ortega achieved a PhD in Statistics from University of São Paulo in 2002 and is a Full Professor at Federal University of São Paulo, Brazil. He published more than 130 papers in internationally referred journals. He has experience in probability and statistics, focusing on parametric inference, acting on the following subjects: distribution theory, survival analysis, residual analysis and sensibility analysis.
Abstract:
The postmastectomy survival rates are often based on previous outcomes of large numbers of women who had a disease, but they do not accurately predict what will happen in any particular patient's case. Pathologic explanatory variables such as disease multi-focality, tumor size, tumor grade, lymphovascular invasion and enhanced lymph node staining are prognostically significant to predict these survival rates. We propose a new cure rate survival regression model for predicting breast carcinoma survival in women who underwent mastectomy. We assume that the unknown number of competing causes that can influence the survival time is given by a power series distribution and that the time of the tumor cells left active after the mastectomy for metastasizing follows the beta Weibull distribution. The new compounding regression model includes, as special cases, several well-known cure rate models discussed in the literature. The model parameters are estimated by maximum likelihood. Further, for different parameter settings, sample sizes and censoring percentages, some simulations are performed. We derive the appropriate matrices for assessing local influences on the parameter estimates under different perturbation schemes and present some ways to assess local influences.
Khalaf S Sultan
King Saud University
Saudi Arabia
Title: Robust regression analysis using re-descending M and MM estimators based on modified Cauchy function
Biography:
Khalaf S Sultan is a Professor at King Saud University, Saudi Arabia. He earned BS in Mathematics from Assuit University Egypt, Master’s degree in Mathematical Statistics from Assuit University, Egypt and PhD in Statistics from Al-Azhar University Egypt under the channel system with McMaster University, Canada. He has published journal and conference papers. His research interests include statistical inference, modeling and simulation, optimization, reliability and mixture models.
Abstract:
The author has proposed the modify Cauchy function that can be used to develop re-descending M and MM estimators in robust regression. The proposed modified Cauchy estimator competes with Tukey’s bi-weight and Qadir’s beta resulting in its enhanced efficiency. In addition, to show the usefulness of the proposed technique, they carry out some Monte Carlo simulation experiments. Further, they apply the findings to some real data set.
Yusuf O B
University of Ibadan
Nigeria
Title: An insight on the misuse of logistic regression model in the face of non-convergence
Biography:
Yusuf O B is a student of University of Ibadan, Nigeria. She has expertise in medical statistics and her research interests include: Mathematical Epidemiology of Infectious Diseases (Malaria), Multilevel Modeling and Analyses of Longitudinal Data.
Abstract:
Background: Logistic regression model is widely used in health research for description and predictive purposes. Unfortunately, most researchers are sometimes not aware that the underlying principles of the techniques have failed when the algorithm for maximum likelihood does not converge. Young researchers particularly postgraduate students may not know why separation problem whether Quasi or Complete occurs, how to identify it and how to fix it. Objective: This study was designed to critically evaluate convergence issues in articles that employed logistic regression analysis published in an African Medical Journal between 2004 and 2013. Methods: Problems of Quasi or Complete separation were described and were illustrated with the National Demographic and Health Survey dataset. Assessment of articles that employed logistic regression was conducted. Results: A total of 581 articles were published, of which 40 (6.9%) used binary logistic regression. However, 24 (60.0%) stated the use of logistic regression in the methodology, while only 3 (12.5%) of these properly described the procedures. None of the articles assessed model fit while majority presented insufficient details of the procedures. In addition, of the 40 that used the logistic regression, the problem of convergence occurred in 6 (15.0%) of the articles. Conclusion: Logistic regression tends to be poorly implemented in studies published between 2004 and 2013. Our findings showed that the procedure may not be well understood by researchers since very few described the process in their reports and may be totally unaware of the problem of convergence or how to deal with it.
- Track 7: Modern Data Analysis; Track 8: Adaptive Biometric Systems; Track 9: Biometric Security; Track 10: Emerging and Future Applications of Biometrics; Track 11: Cyber Security
Session Introduction
Xing Li
Mayo Clinic
USA
Title: Large data integration in RNA-seq and whole genome sequencing to prioritize disease-related genes in hypoplastic left heart syndrome
Time : 09:35-09:55
Biography:
Xing Li has completed his PhD in Bioinformatics from The University of Michigan at Ann Arbor. He is an Assistant Professor in Division of Biomedical Statistics and Informatics, Department of Health Science Research at Mayo Clinic which has been recognized as the best hospital for 2014-2015 by U.S. News & World Report. He has published more than 17 papers in reputed journals and has been serving as reviewers of many journals, such as Genomics, BMC Bioinformatics, Stem Cell Research, PLOS ONE, Physiological Genomics, etc.
Abstract:
Applying high-throughput next generation sequencing technology - RNA-seq and whole genome sequencing (WGS) - provides an unprecedented opportunity to investigate the disease-specific transcription profiles linked to potential genetic causes in Hypoplastic left heart syndrome (HLHS). Bioengineered HLHS patient-specific iPSCs and differentiated cardiac tissues offer a platform to recapitulate the individual developmental process to study the disease molecular causes. In this study, transcriptome profiling using RNA-seq was performed on iPSCs and differentiated cardiomyocytes. The results of RNA-seq data revealed over 4000 and 6000 differential genes between the family members in iPSC and differentiated cells respectively. WGS was done on blood samples of proband and parents to identify millions of variants. Variant filtering according to rarity, predicted damage and mode of inheritance pinpointed 34 genes with uncommon variants potentially involved in the pathogenesis of the disease.Ten of the 34 mutated genes displayed transcriptional differences in iPSCs while 16 of 34 mutated genes showed significantly differential expression in differentiated cells. Expression profiles for genes fulfilling both criteria (9 in total) were further characterized in iPSCs from proband and controls in a guided time-course cardiac differentiation. Two genes, ELF4 and HSPG2, displayed significantly different profiles. Of note, none of these genes had been previously linked to HLHS. In summary, by integrating the data from WGS, RNA-seq, naturally expressed time-course developmental roadmap, we triangulated a list of prioritized candidate genes that may contribute to HLHS and could be a target for future mechanistic studies for disease-specific clinical applications.
Huda Al-Ghaib
Utah Valley University, USA
Title: Structural similarity index algorithm for accurate mammogram registration
Time : 09:55-10:15
Biography:
Huda Al-Ghaib is an Assistant Professor. She received her Undergraduate degree in computer engineering from University of Technology, Baghdad, Iraq, 2006. She is a recipient of Fulbright Scholar in 2009. She earned her Masters’ and PhD degrees in electrical engineering in 2011 and 2015, respectively, from University of Alabama in Huntsville (UAH). During her graduate studies, she was awarded an Outstanding Graduate Student in Engineering in 2014. Her research interests are in the area of pattern recognition and data mining with applications in medical imaging. She is the Author/Co-Author of more than 10 journals and conference articles. She is a Member of IEEE.
Abstract:
American Cancer Society documented that around 240,000 women are diagnosed with cancer in the United States during 2015 with morality rate of around 40,000. 1 out of 8 women would develop breast cancer during her life time. Unfortunately, breast cancer is a symptomatic disease. Screening mammography is a widely used procedure in developed countries to fight against breast cancer. In this procedure, women within a certain age are recommended to perform screening mammography regularly to search for abnormalities such as masses, calcifications, and architectural distortions. One challenge is detecting subtle malignant abnormalities in consecutive, temporal, images acquired for the same patient over time. Currently, a radiologist is visually comparing temporal mammograms to search for these subtle changes. This is a time consuming procedure. Also, limitations of human visual systems have led to misinterpreting the temporal mammograms and hence produce false negative and false positive rates. One method to increase the accuracy of temporal mammogram registration is through applying arithmetic algorithms in machines to detect these subtle changes. In this research, structural similarity index measurement is applied to register temporal mammograms for that purpose. Factors such as image rotation and translation are taken into consideration in this algorithm. The algorithm is compared with two well-known algorithms, i.e., Mutual Information (MI) and Correlation Coefficients (CORR). Based on the radiologist outcome, our algorithm provided better results compared with the other two algorithms. Using metric measurements, SSIM was found to reduce the error rate by 59.3% compared with 61.1% and 63.2% for CORR and MI, respectively.
Adeola Stephen Oladele
Federal University of Technology
Nigeria
Title: Fuzzy-based multi-modal architecture for electioneering in Nigeria
Time : 10:15-10:35
Biography:
Oladele Stephen Adeola is a certified Digital & Computer Forensic Investigator (CFI) as well as an Oracle Certified Professional (OCP). He holds Ph.D. degree in Computer Science and he is also a member of professional bodies such as Computer Forensic Institute, Nigeria (CFIN), Computer Professionals of Nigeria (CPN), Nigeria Computer Society (NCS), Institute of Electrical and Electronics Engineers (IEEE), International Association of Engineers (IAENG) and Association of Computer Machinery (ACM). He has worked in different companies as a Network Engineer as well as a Programmer. His research interest includes Intelligent Systems, Computer Imaging, Biometrics, Land Information Systems, Digital & Computer Forensic and Database Systems. Adeola has published works in a number of local and international Journals. Also, he reviews for local and international Journals such as Journal of Information Technology and General Studies, Rufus Giwa Polytechnic, Nigeria, Net Journal of Social Sciences, Nigeria, Journal: Information Technology & People, Emerald Publication, United Kingdom, and the Journal of Educational Research and Reviews (JERR), United Kingdom. He has served, at various times, as an Information Technology consultant to a number of establishments in Nigeria including ALCATEL Nigeria, Nigeria Police Force, Information Technology unit and Ondo State Property and Development Corporation.
Abstract:
The 2015 election has been adjured to be the best in the history of electioneering process in Nigeria, thanks to deployment of Information Technology. But, in reality, the election was not, after all, flawless as many would want to believe. There were lots of problems associated with it particularly in the areas of verification and authentication of eligible voters by the fingerprint reader. This paper examines these problems and proposed a fuzzy based multi-modal architecture for future election in Nigeria. The architecture is based on the extraction of feature of fingerprint and iris of the prospective voters. Also examined were the features of fingerprints and iris as relating to authentication for electioneering purposes. Further discussed were the advantages of the proposed Architecture over the present Independent National Electoral Commission (INEC) method of voter authentication.
Uchendu Bartholomew A
Federal Polytechnic
Nigeria
Title: Application of multivariate statistical methods in the study of morphological features of Tilapia cabrea
Time : 10:35-10:55
Biography:
Abstract:
Data was collected on the morphological features of Tilapia Cabrea. The weights and lengths were measured in grams and millimeters respectively. The data was subjected to a multivariate data analysis; the principal component analysis and the analysis showed that four principal components accounted for about 88% of the total variability viz: Body weights (X3), body depth (X8), snout length (X6), and standard length (X2). The analysis also included the total length (X1) as one of the least contributors to the size of the fish. This could be justified as it is known that the fins and distance between the anterior and posterior extremity of the mouth of the fish are like chaff and have no weights.
Ali Alkhalifah
Qassim University College of Computer
Saudi Arabia
Title: Online Identity : The trajectory of the web-based identity management systems migration
Time : 11:10-11:30
Biography:
Ali Alkhalifah received a BS in Computer science from Qassim University in 2007, and Master (honor) in IT from the University of Newcastle in 2010, and PhD in Information systems from the University of New South Wales, Australia in 2013.He is an Assistant Professor in Computer College at Qassim University. Until recently he was the head of IT department. He has been involved in several program committees and is being a Reviewer in different international conferences and journals. Ali has a number of research interests including e-business, identity management systems, evaluation of the World Wide Web, and the semantic Web.
Abstract:
Web-based identity management systems (IdMS), a new and innovative information technology (IT) artefact, involve the integration of emerging technologies and business processes to create identity-centric approaches for the management of users, their attributes, authentication factors and security privileges across the Internet within multiple websites. With the growth of online identities on the Internet, IdMS enable the use of the same user data, managed identifiers and authentication credentials across multiple websites, reducing the number of identifiers (e.g. passwords) and profiles with which a user has to deal. As digital identity becomes more and more important in the online world, the emergence of IdMS has brought about primary changes to different online contexts. The trajectory of the IdMS migration can be understood in relation to the proprietary system and the openness of the system to exchanging identity information .This study chooses to make the distinction and classification of three types of IdMS models – isolated model, centralized model and decentralized model –because of their differences in architecture and standards and their different impact on security, privacy and usability issues.We develop guidelines for IdMS designers and provides for the employment of more targeted implementation efforts. Also, we discuss some implications and highlight some opportunities for creating and enhancing new IdMS.
Abdullah K Alqallaf
Kuwait University, Kuwait
Title: Tackling the big-data challenges of genomics by statistical and computational-based models
Time : 11:30-11:50
Biography:
Abdullah K. Alqallaf is an assistant professor at the Electrical Engineering Department, College of Engineering and Petroleum, Kuwait University. His research interests include Microwave, Microwave Imaging Techniques and Analysis for Tumor Detection- Genomics Signal Processing and Bioinformatics- Statistical and Wavelet Signal Processing- Speech and Multimedia Signal Processing- Signal Processing for Communications and Networking- Machine Learning for Signal Processing- Design and Implementation of Signal Processing- Image and Multidimensional Signal Processing- Medical image analysis / Feature extraction, Feature selection, Segmentation, Detection / Estimation, and Classification.
Abstract:
Data on genome structural and functional features for various organisms is being accumulated and analyzed aiming to explore in depth the biological information and to convert data into meaningful biological knowledge. Recent developments in the experimental technologies and approaches, such as microarray and DNA sequencing, generate high-resolution genetic data and make it possible to enhance our understanding of the complex dynamic interactions between complex diseases and the biological systems through computational-based models. My talk will be about the choice of the statistical-based algorithms for processing the big genomic data and that may affect the findings which may lead to superior diagnosing directions.
Rosa V Dacosta
The Food Trust
USA
Title: Growing the field: Current approaches to data collection at farmers’ markets
Biography:
Rosa V Dacosta has completed her MPH from Drexel University School of Public Health, Philadelphia, USA. She has been involved with over 4 publications in the fields of vascular biology and public health.
Abstract:
There is limited published research about the dietary impacts of farmers' markets. We sought to understand whether market managers collect data about markets and to examine the instruments and strategies used. Of the 359 market managers contacted across the United States, representing 543 markets, 185 managers participated in a telephone survey. A subset supplied copies of data collection tools for further analysis. Ninety-three percent of market managers collect data such as customer surveys, vendor applications, customer counts, or demographics. The potential utility of the data collected by mangers and suggestions for study of the dietary impacts of farmers markets are discussed.
Vu Thi Kim Ngoc
Center Of Analytical Services And
Experimentation Of HCMC (CASE)
Vietnam
Title: Statistical models for colorectal cancer screening using urine NMR spectra
Biography:
Vu Thi Kim Ngoc is Vice Director of Center of Analytical Services And Experimentation HoChiMinh City–Vietnam. She is an NMR specialist in biomolecules structures and interactions (protein, DNA…) with 18 year-experience. She is currently developing statistical and computing methods in analytical chemistry in Vietnam.
Abstract:
Colorectal cancer (CRC) is one of the most common types of cancer. Detecting CRC at an early stage improves survival rates dramatically. Statistical models for Colorectal Cancer (CRC) identification was built by metabolomics, based on the 1H NMR data of urine. Result of principle component analysis (PCA) and partial least square (PLS) on urine NMR data of 64 cases and 76 controls, collected at MEDIC Centre (HCMC-Vietnam). Specific differences have been observed, in particular in the spectral range corresponding to some metabolites. This analysis was to compare and verify important metabolic alterations between CRC patients and healthy persons and would be extended to the diagnosis of colorectal cancer based on the profile of common and abundant metabolites. Key word: colorectal cancer, metabolomics, 1H NMR, multivariate statistics, PCA, PLS.
K Muralidharan
The Maharajah Sayajirao University of Baroda
India
Title: Theory of inliers: Modeling and applications
Biography:
K Muralidharan is currently working as a Professor and Head of Department of Statistics, Faculty of Science, The Maharajah Sayajirao University of Baroda, India. He is also the Director of Population Research Centre, MSU Baroda. He has obtained his MSc degree in Statistics from Calicut University, Kerala; MPhil and PhD in Statistics from Sardar Patel University and has completed Post-Doctoral Fellowship from Institute of Statistical Science at Academia Sinica, Taiwan. He is an internationally qualified Six Sigma Master Black Belt. He has won number of Awards and Fellowships including the Commonwealth Academic Fellowship sponsored by British Council, UK. Recently, he was awarded the “Young emerging future leader of Quality and Reliability” by Society for Reliability Engineering, Quality and Operations Management (SREQOM), New Delhi. He is a Nominated Principal Member of the Bureau of Indian Standards (BIS), New Delhi. He is currently the Secretary of Indian Society for Probability and Statistics.
Abstract:
An inlier in a set of data is an observation or sub-set of observations not necessarily all zeroes, which appears to be inconsistent with the remaining data set. They are the resultant of instantaneous or early failures usually encountered in life testing and reliability, financial, management, clinical trials and many other studies. Unlike in outlier theory, here, inliers form a group of observations which are defined by the model itself. With the inclusion of inliers, the model will become either a non-standard distribution or one having more than two modes and hence usual method of statistical inference may not be appropriate to proceed with. We discuss some inliers prone models with some assumptions to study the estimation of inliers in exponential distribution. Various inlier prone models and estimation procedures are discussed. The detection of inliers and the problems associated with detections are presented. An illustration and a real life example are also discussed.
Ramón Santana Fernández
University of Informatics Science
Cuba
Title: Fingerprint template protection scheme based on minutiae structures; security issues and vulnerabilities
Biography:
Ramón Santana Fernández is a PhD student who graduated as Engineer in Informatics Sciences at the University of Informatics Sciences in 2011. He is a Researcher in Biometrics field with 7 years of experience started as researcher assistant in 3rd year of the career in Dactilab project in 2009. He has worked as student and worker in biometric software development and process research, obtaining awards for his participation in investigations and software development in the field of automatic fingerprint identification systems at the Identification and Digital security Center. He has published articles in journals and events at the University of the Informatics Science
Abstract:
The implementation of biometrics solutions for users’ authentication in daily tasks has caused great concern about the safety and privacy of the biometric data. Different vulnerabilities detected on automated fingerprint identification systems could expose the minutiae if the templates are in plain text. To solve this security issue several minutiae template protection models have been proposed like fuzzy vault, biohashing and cancelable templates; however the minutiae alignment process is required before the template matching is executed to increase the probabilities to find a positive match. In order to protect efficiently the biometric data it is necessary to meet three basic requirements: Cryptographic security, revocability and performance; however, most of the models described to date fail in this task. The fingerprint minutiae template protection scheme must capture as much identifying information of the fingerprint as possible and solve the problem of the template alignment before the matching process is executed in the protected domain. A study on the fingerprint minutiae template protection models, specifically those that start the process from features derived of the minutiae using minutiae structures, their main strengths, weaknesses and vulnerabilities was conducted in this work. Analyzing the types of attacks described in the bibliography to obtain the original biometric data from protected templates and the attacks performed to the minutiae triplets like minutiae vicinity decomposition is the main objective of this research. As a result, the vulnerabilities of each minutiae structure are identified, the elements to propose a new minutiae structure are analyzed and initial results are discussed.
Mikhail Moshkov
King Abdullah University of Science and Technology
Saudi Arabia
Title: Extensions of dynamic programming for decision tree study
Biography:
Mikhail Moshkov is professor in the CEMSE Division at King Abdullah University of Science and Technology, Saudi Arabia since October 1, 2008. He earned master’s degree from Nizhni Novgorod State University, received his doctorate from Saratov State University, and habilitation from Moscow State University. From 1977 to 2004, Dr. Moshkov was with Nizhni Novgorod State University. Since 2003 he worked in Poland in the Institute of Computer Science, University of Silesia, and since 2006 also in the Katowice Institute of Information Technologies. His main areas of research are complexity of algorithms, combinatorial optimization, and machine learning. Dr. Moshkov is author or coauthor of five research monographs published by Springer.
Abstract:
In the presentation, we consider extensions of dynamic programming approach to the study of decision trees as algorithms for problem solving, as a way for knowledge extraction and representation, and as classifiers which, for a new object given by values of conditional attributes, define a value of the decision attribute. These extensions allow us (i) to describe the set of optimal decision trees, (ii) to count the number of these trees, (iii) to make sequential optimization of decision trees relative to different criteria, (iv) to find the set of Pareto optimal points for two criteria, and (v) to describe relationships between two criteria. The results include the minimization of average depth for decision trees sorting eight elements (this question was open since 1968), improvement of upper bounds on the depth of decision trees for diagnosis of 0-1-faults in read-once combinatorial circuits, existence of totally optimal (with minimum depth and minimum number of nodes) decision trees for monotone Boolean functions with at most six variables, study of time-memory tradeoff for decision trees for corner point detection, study of relationships between number and maximum length of decision rules derived from decision trees, study of accuracy-size tradeoff for decision trees which allows us to construct enough small and accurate decision trees for knowledge representation, and decision trees that, as classifiers, outperform often decision trees constructed by CART. The end of the presentation is devoted to the introduction to KAUST.
Imran Mahmood
University of Dammam
Saudi Arabia
Title: Foundations and technical challenges of spatio-temporal epidemiological surveillance
Biography:
Imran Mahmood is currently working as Assistant Professor at the College of Computer Science, University of Dammam and is currently involved in research activities in the area of Healthcare Information Technology, with specialty in Epidemiological Informatics. He worked as Assistant Professor at University of Engineering & Technology, Lahore as Lead Researcher at the Center for Visual Analytics Research, Supervised research and development of different Epidemic Surveillance projects. He earned Doctoral degree in Computer Systems at School of Information and Communication Technology (ICT), KTH-Royal Institute of Technology, Sweden in 2013. He earned Master’s degree in Software Engineering of Distributed Systems at the same school in 2007 along with Comprehensive Knowledge of Modeling & Simulation and Visual Analytics. He worked in collaboration with Swedish Defense Research Agency (FOI) during the Master’s and Doctoral research. He has delivered at different workshops, lectures and invited talks on the topics of research interests.
Abstract:
In this talk, we will discuss fundamental concepts of Epidemiological Surveillance (ES). ES is an ongoing systematic collection, visualization, analysis and interpretation of health data, collected for the purpose of timely dissemination of outbreak forecasts. It is an investigational approach where health experts are provided with automated set of tools for real-time data collection from various health departments, monitoring of disease indicators to detect outbreak earlier than would otherwise be possible with traditional diagnosis based methods. Hence the detection of adverse effects can be made at the earliest possible time, possibly even before disease diagnoses can be confirmed through clinical procedures and laboratory tests. We will highlight key challenges faced in the development and operations of Epidemiological Surveillance systems, mainly due to: (A) complex characteristics and the diverse nature of the infectious diseases, (B) the distinct nature of population dynamics, mobility, demographic factors and (C) the geographic nature, environment and the weather conditions of the area under study. We will discuss evolutionary development in the trends, methods and technologies of the surveillance systems and discuss how this progress is addressing the key challenges. In the end, we will argue how a sophisticated health surveillance system helps in alleviating potential health risks and minimize the threats of natural or man-made disasters and eventually supports effective decision making in emergency management.
Mohammad Imran
King Faisal University
Saudi Arabia
Title: Designing of robust multi-biometric systems for person authentication
Biography:
Mohammad Imran received PhD in Computer Science in 2012. The title of his thesis is ‘Some Issues Concerning Biometric system’ under the guidance of Professor G Hemantha Kumar. During 2012-13, he had been a Post-Doctorate Fellow under TRC (The Research Council, Oman) sponsored program. Currently, he is working as an Assistance Professor in King Faisal University, Saudi Arabia. Prior to this, he was working as an Associate Consultant at WIPRO Technolgoies, Bangalore. His areas of research interests includes machine learning, pattern recognition, computer vision, biometrics, image processing, predictive analysis, algorithms, data structure, linear algebra. He authored 25 international publications which include journals and peer-reviewed conferences.
Abstract:
There is a global concern to implement accurate person verification in various facets of social and professional life. This includes banking, travel, medical and secure access to social security services. While biometrics has been deployed with various choices such as face, fingerprint, iris, etc., the importance of higher level of security has influenced two main things. One is of finding newer, more universal biometric traits and the other one is multimodal options. Most of the biometric systems employed in the real-world applications are unimodal. They rely on the evidence of a single source of information for authentication which is easier to install and computationally less hectic. The unimodal systems have to contend with a variety of problems. This, in turn, increases False Acceptance Rate (FAR) and False Reject Rate (FRR). A good system needs very low FAR and very low FRR. This can be achieved by the multimodal system. The multimodal system is a sub-set of multi-biometric system which establishes identity based on the evidence of multiple biometric traits. Thus, in this presentation, we address critical issues in designing a multimodal biometric system, i.e., choices of biometric modalities, feature extraction algorithms and fusion strategies. Complementary and supplementary information acquired by feature extraction algorithms are addressed in our work for their contribution towards the improvement of recognition rate. A fundamental issue in designing a multimodal system lies in fusing the information from sensed data. The fusion methodologies at four different levels viz., sensor, feature, score and decision level have been evaluated for the performance with appropriate fusion rules. Fusion methodologies have been exploited for addressing different combinations of multimodal systems.
Ong’ala J
Kenya Agricultural and Livestock Research Organization
Kenya
Title: The use of principal component analysis in sugarcane cone selection
Biography:
Ong’ala J has BSc and MSc degrees in Applied Statistics. He is the Head of Sub-Unit, Research Methods and Analytics, Kenya Agricultural and Livestock Research Organization.
Abstract:
In the process of phenotypic evaluation of sugarcane, many traits are simultaneously evaluated. These traits are often highly interrelated. Evaluation of all these traits is costly and may not enhance selection response. In this study, we aim at using the Principal Component Analysis (PCA) to identify representative traits for phenotypic characterization of sugarcane, and thereby to select superior clones in the breeding process. The results indicate that when PCA is used, only 10 out of 19 traits will be significant in identifying the superior clones and their contributions to the selected traits are quantified.
Tintu Thomas
C U Shah Medical College
India
Title: ANFIS based approach for mining high dimensional gene expression microarray data: A comparative study
Biography:
Tintu Thomas completed her Master’s degree in Biostatistics from Mahatma Gandhi University. She completed Postgraduate diploma in Epidemiology from Indian Institute of Public Health, India. She is currently working as a Lecturer in Biostatistics in Department of Community Medicine. She has more than 7 years of experience as Lecturer in Biostatistics, teaching paramedical and Postgraduate medical students. She has 6 research publications on her name. She was one of invited speakers for national seminar in Stochastistic medicine in Kerala, India. She also worked as Course-In-Charge in Master’s level Biostatistics. She has five years of research experience in the applied statistics area, especially in gene expression data analysis.
Abstract:
Microarrays technology is used to find the expression of many thousands of genes simultaneously. Identification and classification of set of genes out of these thousands is a complex process. In microarray data, a major challenge is the need of robust method for proper identification of differential expression genes. In this paper, we made a comparative study on the genomic classification performance of conventional neural networks method and fuzzy inference method. We used fuzzy inference based classification rules for extracting significant genes from the gene expression data set. Fuzzy rules were utilized to train the Fuzzy Inference System (FIS) and classified the gene expression level in to useful output form namely expressed and non-expressed genes. It is found that the adaptive neural network fuzzy inference methods worked better in classifying differentially expressed genes compared to other conventional methods.
Michael A Idowu
Abertay University
UK
Title: Instantaneous, intelligent and robust time series data modelling and network inference
Biography:
Michael A Idowu earned both his PhD and MSc qualifications in Complex Systems Modelling (Systems Biology) and Software Engineering (Computer Games Technology), respectively with distinction from Abertay University. As a Software Engineer and Theoretician, his research focuses on the invention and further development of model theory for instantaneous development of new models of complex systems, including biological systems. Working at the interface between mathematics and computer science, his expertise lies in mathematical and data analysis, the development of novel analytical methods for time series data and inference of interaction networks among measurables in time series data.
Abstract:
Dynamic processes in complex systems may be profiled by measuring system properties over time. One way of capturing and representing such complex processes or phenomena is through ODE models of measured time series data. However, construction of ODE models purely from time series data is extremely difficult. First, the system targeted must be identified. Second, the parameters of the model must be estimated in a data consistent manner. Lastly, the constructed model must be capable of exact simulation of the measured historical data as though the constructed model was the means (source) of the acquired data. Hence, intelligent modelling of exact data may be a necessity in modelling systems that are not well-studied or well-known. The requirement to achieve the above-mentioned objectives within a short period of time, i.e., in order to cope with occasional or necessary demands of rapid data utilisation, makes both model construction and complex systems identification a modeller’s nightmare. In this presentation, a novel dynamic modelling technique (framework), invented and currently being further developed by the author, is proposed and presented as an effective computational method for reconstructing data-consistent ODE models, which adequately addresses the challenges of instantaneous systems identification and automated parameter estimation, under limited data and under-determined conditions. These dynamic modelling techniques (algorithms) enable data-consistent models of complex systems to be automatically constructed, with or without making a priori assumptions about the underlying network, which guarantees successful construction of feasible models in a matter of seconds. These claims are then justified with applications and examples.
- Young Researchers Forum
Session Introduction
Juliana Torres Sánchez
Universidad Nacional de Colombia
Colombia
Title: Effect of atypical populations in meta-analysis
Time : 17:35-17:45
Biography:
Juliana Torres Sánchez has completed her Undergraduation at Universidad Nacional de Colombia, and she is now studying for her Graduate program in Statistics at the same university in Colombia. She has published an article called ‘Efecto de niveles crecientes de nitrógeno no protéico dietario sobre la concentración de precursores gluconeogénicos en hígado bovino’ in the Journal Facultad Nacional De Agronomía; Medellín ISSN: 0304-2847 ed: Universidad Nacional de Colombia v.63 fasc.N/A p.5363 - 5372, 2010.
Abstract:
The meta-analysis is a statistical technique that allows the sum of different studies, to obtain conclusions allowing the unification of results, helping for true understanding of the response variable analyzed. In this research, the effects that an atypical study has on the results of the meta-analysis are discussed, and provide recommendations for dealing with them, where the interest response, refers to mean and/or proportions. General Objectives: To generate a proposal about the identification and the appropriate handling of atypical studies in meta-analysis. Specific Objectives: To determine the effect of an atypical study over meta-analysis results, to introduce the methodology by using simulation in ‘metafor’ package from R software and developing the corresponding conclusions and suggest solutions to detect and deal with atypical studies. Methods or Models: This research project is based on the realization of a meta-analytic process to evaluate the effect of atypical populations in the process of meta-analysis through simulations using statistical software that supports control to assess variations in the mean and variance of only one study that is considered atypical, under control levels. The simulation procedure is based on shifts in mean and variance both up and down, so that their effects can be observed in the meta-analysis to infer the effect in other types of study of various features, without diminishing the validity. The incidence that atypical study takes in an analytical process may be negligible or can be so significant that it can radically change the meaning or own inferences reached by the meta-analysis, offering expressions that could be taken as true and false or vice versa.
Rachel Oluwakemi Ajayi
University of Kwazulu-Natal
South Africa
Title: The association of under nutrition and cognitive outcomes among 4-6 years old South African children
Time : 17:45-17:55
Biography:
Rachel Oluwakemi Ajayi is a PhD student in Statistics, University of KwaZulu-Natal with research in the area of identifying factors which influence the Cognitive Development of Children in impoverished communities. She had a Master of Science in Statistics from University of Lagos, Nigeria, with research on Error Analysis of Generalized Negative Binomial and Bachelor of Science in Statistics from University of Ilorin, Nigeria, with research on Statistical Quality Study, Characterization and Control of Faults reported to NITEL. She has published one paper in reputed journal and is currently working on a manuscript.
Abstract:
Background: The study investigated 4-6 years old children’s health, nutritional status and cognitive development in a predominantly rural area of Kwazulu-Natal, South Africa. Methods: This was the baseline of a longitudinal cohort study (Asenze) Phase 1 study of pre-school children in a rural area of Kwazulu-Natal, South Africa. The study investigated the association of demographic variables, site (geographic area), child’s HIV status, child’s haemoglobin level, anthropometric measures (height-for-age z-scores), weight-for-age z-scores, mid-upper arm circumference on children’s cognitive performance measured by Grover counter and Kauffman’s KABC-11 subtests. The use of General Linear Models was employed to determine the effect of the predictors while the study also incorporated factor analysis to create global cognitive scores. Result: Based on the data, the effect of haemoglobin, sex and weight-for-age were not as significant as other factors. The principal factors of children’s cognitive outcomes were site, education, height-for-age, mid-upper arm-circumference, HIV status and age. Children who had any low cognitive scores came from poorer sites, had less pre-school education, and were older; while HIV positive children most likely had low scores in height-for-age and mid-upper-arm circumference. Conclusion: There is a need to improve the nutrition of children in this region of Kwazulu-Natal, in order to improve their cognitive outcomes.
Rowena F Bastero
University of Maryland
USA
Title: A swapping method based on covariate classification for average treatment effect estimation
Biography:
Rowena F Bastero has completed her Master’s degree in the University of the Philippines and is currently completing her PhD in the University of Maryland, Baltimore County. At present, her areas of interest are propensity score analysis and meta-analysis under the guidance of Dr Bimal K Sinha. Her other area of research is spatio-temporal modeling, where she has published a paper entitled “Robust Estimation of a Spatiotemporal Model with Structural Change” in the Journal of Communications and Statistics – Simulation and Computation.
Abstract:
In observational studies, systematic differences in the covariates of the treatment and control groups may exist which pose a problem in estimating the average treatment effect. Although propensity score analysis provides a remedy to this issue, assessment made on the matched pairs or groups formed through these scores continue to reflect imbalance in the covariates between the two groups. Hence, a modified method is proposed that guarantees a more balanced group with respect to some, if not all, possible covariates and consequently provide more stable estimates. This matching procedure estimates the average treatment effect using techniques that infuse “swapping” of models based on classical regression and meta-analyses procedures. The “swapping” procedure allows for the imputation of the missing potential outcome Y(1) and Y(0) for units in the control and treatment groups, respectively while meta-analysis provides a means of combining the effect sizes calculated from each matched group. Simulated and real data sets are analyzed to evaluate comparability of estimates derived from this method and those formulated based on propensity score analysis. Results indicate superiority of the estimates calculated from the proposed model given its smaller standard errors and high power of the test. The proposed procedure ensures perfect balance within matched groups with respect to categorical variables and addresses issues of homogeneous effect sizes. It also identifies and incorporates relevant covariate information into the estimation procedure which consequently allows derivation of less biased estimates for the average treatment effect.
Som B Bohora
The University of Oklahoma Health Sciences Center
USA
Title: Generalization of the semi-parametric AUC regression model with discrete covariates
Biography:
Som B Bohora is working as Research Biostatistician at the Behavioral and Developmental Pediatrics section at OUHSC and is also a Student at the College of Public Health. He was trained in biostatistics and epidemiology, and has research experience in Fetal Alcohol Spectrum Disorders (FASD), HIV/AIDS clinical trials and child maltreatment prevention. He is interested in the applications of statistical computing, predictive analytics, and dynamic reporting in these research arenas.
Abstract:
In this research, we considered data with a non-normally distributed response variable. In particular, we extended an existing AUC model that handles only two discrete covariates to a generalized AUC model that can be used on data with any number of discrete covariates. Comparing with other similar methods which requires iterative algorithms and bootstrap procedure, our method involved only closed-form formulae for parameter estimation, hypothesis testing, and confidence intervals. The issue of model identifiability was also discussed. Our model has broad applicability in clinical trials due to the ease of interpretation on model parameters and its utility was illustrated using data from a clinical trial study aimed at evaluating education materials for prevention of Fetal Alcohol Spectrum Disorders (FASDs). Finally, for a variety of design scenarios, our method produced parameters with small biases and confidence intervals with nominal coverages as demonstrated by simulations.
Dereje W Gudicha
Tilburg University
The Netherlands
Title: Power Analysis for the Likelihood Ratio Test in Latent Markov Models: Short-cutting the bootstrap p-value based method
Biography:
Gudicha is a prospective Ph.D student at Tilburg University, The Netherlands. His PhD dissertation, supervised under Professor Jeroen K. Vermunt, deals with power and sample size computation methods for both simple mixture models for cross-sectional data and complex mixture models for longitudinal data. The results of this dissertation contribute to the field of mixture modelling in many ways: address factors that affect the power of statistical tests for mixture distributions, but also, considering the case of hypothesis testing for which the asymptotic theory is not warranted, present a method for power/sample size computation. Gudicha got his master in applied statistics from Addis Ababa University and research master in social and Behavioral science from Tilburg University. He has several years of teaching at the University and has also published his research work on high quality journals.
Abstract:
In recent years, the latent Markov (LM) model has proven useful to identify distinct unobserved states and transitions between these states over time in longitudinally observed responses. The bootstrap likelihood ratio (BLR) test is becoming a gold standard for testing the number of states, yet little is known about power analysis methods for this test. This paper presents a short-cut to a p-value based power computation for the BLR test. The p-value based power computation involves computing the power as the proportion of the bootstrap p-value (PBP) for which the null hypothesis is rejected. This requires to perform the full bootstrap for multiple samples of the model under the alternative hypothesis. Power computation using the short-cut method involves the following simple steps: obtain the parameter estimates of the model under the null hypothesis, construct the empirical distributions of the likelihood ratio under the null and alternative hypotheses via Monte Carlo simulations, and use these empirical distributions to compute the power. The advantage of this short-cut method is that it is computationally cheaper and is simple to apply for sample size determination.
- Young Researchers Forum
Session Introduction
Bingjie Li
The University of Texas, USA
Title: Combine genetic, molecular, cellular and statistical analysis to determine pathogen variations
Time : 16:45-16:55
Biography:
Bingjie Li holds BM, MPH degrees and is a PhD candidate working at the University of Texas Health Science Center at Houston under supervision of Dr Zhi-Dong Jiang. She has completed her Clinical Medicine degree from Weifang Medical University and Master of Public Health from the University of Texas School of Public Health. She is currently working on several research projects within Center for Infectious Diseases that explore pathogen variations by a combination of genetic, molecular, cellular and statistical methods.
Abstract:
Individualized target-specific molecular medicine improves the patient managements and effective treatment of diseases in the future. The individualized medicine will emphasize the differences from both host and pathogens for every disease. The diagnosis will be determined at disease molecular levels and the treatments will be applied according the difference of individual patient as well as pathogen. We previously demonstrated a single genetic mutation will alter the disease onset time, severity and location in the host. In this study, we intend to study the variations of a single pathogen. Clostridium difficile (C. difficile) is a gram-positive, anaerobic, spore-forming bacillus that can cause pseudomembranous colitis requiring colectomy and resulting in death in hospitalized patients. Patients receiving the first-line traditional antibiotic treatments for CDAD have initial recurrence at a rate of 25% (RCDAD). Once a recurrence occurs there is a 45% chance of a second recurrence, and after the second recurrence, 65% of patients will have a third recurrence. We hypothesize the C. difficile has different variations and this variation could be the contributor for treatment failure. To test our hypothesis, 148 strains of C. difficile were collected from the hospital. All samples were confirmed the C. difficile presentation by culture and toxin assays. Bacterial DNA was extracted. Their molecular components were further analyzed by PCR amplification using four pairs of primers followed sequencing. Bacteria genetic characteristics were identified according to their molecular finger printing by Multilocus Sequence Typing (MLST) and dendrogram analysis. Although all bacteria are similar to the standard positive control by culture and toxin assays, biostatistical analysis reveals that they belong to different clusters by MLST. The data suggest the bacteria from environmental and infected patients are different. The bacteria in the same cluster still have genetic differences. The bacteria genetic differences may be one of the contributors that cause the treatment failure. The combination of genetic, molecular, cellular and statistical analysis will not only help us to understand the complexity of disease processes but also guide us to develop sufficient personalized disease management as well as discover more effective drugs to treat diseases.
Wang Shao Hsuan
National Taiwan University
Taiwan
Title: Generalized concordance measure: Generalized regression model and dimension reduction
Time : 16:55-17:05
Biography:
Wang Shao Hsuan is currently studying at National Taiwan University, Department of Mathematics in a PhD program. He had published a paper in SCI while pursuing his Master’s degree.
Abstract:
In the scientific research literature, rank-based measures have been widely used to characterize a monotonic association between a univariate response and some transformation of multiple covariates of interest. Instead of using a linear combination of covariates, we introduce a multivariate polynomial score to compute the corresponding concordance index through more general semi-parametric regression models. It involves the estimation for the degree of the multivariate polynomial and the central subspace (CS). To deal with this research issue, we propose a BIC-type estimation approach, which is implemented by an effective computational algorithm, to achieve the model selection consistency.
Yusuf O B
University of Ibadan
Nigeria
Title: An insight on the misuse of logistic regression model in the face of non-convergence
Time : 17:05-17:15
Biography:
Yusuf O B is a student of University of Ibadan, Nigeria. She has expertise in medical statistics and her research interests include: Mathematical Epidemiology of Infectious Diseases (Malaria), Multilevel Modeling and Analyses of Longitudinal Data.
Abstract:
Background: Logistic regression model is widely used in health research for description and predictive purposes. Unfortunately, most researchers are sometimes not aware that the underlying principles of the techniques have failed when the algorithm for maximum likelihood does not converge. Young researchers particularly postgraduate students may not know why separation problem whether Quasi or Complete occurs, how to identify it and how to fix it. Objective: This study was designed to critically evaluate convergence issues in articles that employed logistic regression analysis published in an African Medical Journal between 2004 and 2013. Methods: Problems of Quasi or Complete separation were described and were illustrated with the National Demographic and Health Survey dataset. Assessment of articles that employed logistic regression was conducted. Results: A total of 581 articles were published, of which 40 (6.9%) used binary logistic regression. However, 24 (60.0%) stated the use of logistic regression in the methodology, while only 3 (12.5%) of these properly described the procedures. None of the articles assessed model fit while majority presented insufficient details of the procedures. In addition, of the 40 that used the logistic regression, the problem of convergence occurred in 6 (15.0%) of the articles. Conclusion: Logistic regression tends to be poorly implemented in studies published between 2004 and 2013. Our findings showed that the procedure may not be well understood by researchers since very few described the process in their reports and may be totally unaware of the problem of convergence or how to deal with it.
Anam Riaz
National College of Business Administration and Economics (NCBA&E)
Pakistan
Title: Role of BMS and infrastructure in crude death rate and infant mortality rate
Time : 17:15-17:25
Biography:
Anam Riaz has completed her BS – Statistics at the age of 22 years from GC University Lahore, Pakistan. Currently she is research scholar in National College of Business Administration & Economics (NCBA&E) Lahore. Recently she got the scholarship from Higher Education Commission (HEC) of Pakistan for completing her PhD.
Abstract:
The aim of the study is to investigate the relationship of infrastructure of health sector and basic medical staff with the IMR and CDR, respectively. Another purpose of doing this study is to describe the historical trend of infrastructure in health sector and Basic Medical Staff (BMS). Crude Growth Rate (CGR) and Year on Year (YoY) percentage change of basic medical staff and infrastructure shows that there is downward trend after the period 1995-96. The results of one way ANOVA show that each decade has different growth in infrastructure and basic medical staff. Similarly, regression analysis shows that there is linear relationship among the IMR, CDR, basic medical infrastructure and BMS. The finding of the study indicates that basic infrastructure and basic medical staff is playing an important role in reducing the CDR and IMR of Pakistan.
Serge M A Somda
Centre Muraz
Burkina Faso
Title: An algorithm to evaluate follow-up strategies after primary treatment in oncology by computer simulation
Time : 17:25-17:35
Biography:
Serge M A Somda is graduated in statistics and in public health. He is completing his PhD in Biostatistics at the University of Toulouse. He is also employed as Methodologist in a Health Research Center in Burkina Faso, Centre Muraz, where he is In-Charge of providing a methodological support to the research projects of the Center. He has contributed to several research projects and is author or co-author of some peer reviewed articles.
Abstract:
Organizing the surveillance of patients treated for cancer, for early diagnosis of recurrences, is still a subject of debate. Evidence needs to be highlighted to determine when a particular follow-up strategy is efficient enough to have a significant impact on survival. However, the clinical evaluation of follow-up programs after primary treatment is difficult to undertake. This work proposes an algorithm to evaluate a novel follow-up surveillance strategy after treatment in oncology. A computer based randomized two parallel arms non-inferiority clinical trial is simulated to compare two strategies. Overall survival and cancer specific mortality were the two endpoints evaluated. The methodology of Discrete Events Simulation, based on Patient Oriented Simulation Technique, was used. The natural history of the patient’s disease after primary treatment was generated for each individual. Then, for each scheduled visit date, this history could be modified if a relapse was detected early enough and efficient treatment options are available. An application of the algorithm based on breast cancer data shows its advantages in decision making.
- Young Researchers Forum
Session Introduction
Hissah Alzahrani
Florida State University School
USA
Title: A comparison of three models in multivariate binary longitudinal analysis
Time : 11:25-11:35
Biography:
Hissah alzahrani has studied in computer science and statistics departments. She completed her master degree in 2009 from statistics department at King Abdul-Aziz University and started the joint program of master and PhD in biostatistics at Florida state university in 2012. Her research interests include multivariate longitudinal data analyses and survival analysis that applies in biomedical applications and clinical trials. She is working on improvement of her skills in skills in SAS and R software to accommodate the advanced statistical analysis in different biostatistics fields.
Abstract:
Multivariate longitudinal data analysis plays an important role in many biomedical and social problems. In this article, we present three methods for analyzing multiple and correlated binary outcomes; each one can be benefecial for determined aims. We review method one and method two and we proposed method three. The three methods estimate the marginal means using the GEE approach for multivariate binary longitudinal data. The first method addresses the question of estimating one group of covariate parameters for many binary outcomes while accounting for their multivariate structure. The second method addresses the question of estimating the covariate parameters for each binary outcome separately. The third method is an estimation of the covariate parameters for each combination of outcomes. Our goal is to investigate the difference among the parameter estimations of the three methods. In the simulation element, we present many scenarios related to diffeerent correlation structures. In the application element, we present a follow up study (Florida Dental Care Study ) that measured three binary outcomes and five covariates in four intervals. That particular study is a useful explanation of the variation between outcomes since the outcomes were highly correlated.
Wilmar López Oviedo
Universidad Nacional de Colombia
Colombia
Title: Study of the growth rate of neo-tropical trees via non-linear mixed models
Time : 11:35-11:45
Biography:
Wilmar López Oviedo has completed his education from National University of Colombia and his Master’s degree studies from National University of Colombia School of Statistics and he has a PhD in Statistical Science. He is the Director of Raúl Alberto Pérez Agámez and a Professor in the School of Statistics. He has published more than 10 papers in national and international journals and he has participated in various research projects national level.
Abstract:
The growth rate of trees is essential information for understanding the dynamics in tropical forests, and ecological restoration plans. However, this information is limited for tropical tree species. The growth is measured as the change in the diameter of the trunk (mm/year). The growth rate changes over time results in a relationship between the size and time and it’s a basic assumption for models that use initial diameter like time indicator. However, the size of a tree is not age indicator, being an assumption of evaluating. In this sense, we apply non-linear mixed models for analyzing the growth according to the initial diameter. We include three different models and compare them using the AICW. This data was obtained from permanent plots, where individuals of different sizes co-exist. We added some own characteristics of the species that influence their performance and storage capacity of biomass. We conducted a proper transformation process to the ecology of the system to normalize the data. Split by diameter class, we eliminate data iteratively to have symmetric distributions in each class. This is to eliminate the effect of the trees that are sick. Finally, the generated models were evaluated using data from measurements of trees on the same plot, measured for 20 years. If the size indicates the time, our spatial data models adequately predict the increase in temporary data, otherwise the assumption is invalid. We found that 25% of said diameter variation, wood density affects growth and models using the initial diameter as time indicator are biased as temporary data were weakly predicted by the resulting models.
Yacouba Ouattara
Université de Ouagadougou
Burkina Faso
Title: The quality of life in persons living with HIV: A follow up over 12 months in Ouagadougou-Burkina Faso
Time : 12:10-12:20
Biography:
Abstract:
Introduction: In Burkina Faso, very few are known about the quality of life of people living with HIV in their routine follow up. The aim of the study was to measure the quality of life, in the routine follow-up of people living with HIV and its change over time. Methods & Materials: 424 people living with HIV were followed up during 12 months in Ouagadougou-Burkina Faso. The quality of life was measured through three interviews over time, using the World Health Organization Quality of Life assessment brief tool in patients with Human Immunodeficiency Virus infection (WHOQOL HIV-BREF). The Friedman test was used to assess significant differences in quantitative variables at each of the three follow up interviews. Groups at baseline, at month 6 and at month 12 were compared using Wilcoxon signed rank test for quantitative data and McNemar test for qualitative variables. Pearson Chi² was used when needed. Multivariable logistic regression models were fit to estimate adjusted odds ratio (OR) and 95% confidence intervals (95% CI). Trends in global score of the quality of life in subgroups (status related to HAART) were assessed using repeated measures univariate analysis of variance. A p-value less than 0.05 was considered as significant. Results: At baseline, the highest scores of quality of life were recorded in the domain of spirituality, religion and personal beliefs and the lowest scores were recorded in the environmental domain. This trend was maintained during the 12-months follow-up. The overall score increased significantly over time. Over the twelve months of follow up, not having support from family for medical care, being under Highly Active Anti-Retroviral Treatment (HAART), self-perception as healthy, and having a global score of quality of life less than 77 were the baseline factors that were likely to predict an increase in the overall score of quality of life. Conclusions: Our findings suggest conducting interventions linked to environmental domain to enhance the quality of life of people living with HIV/AIDS in Burkina Faso. Particular attention could be paid to people without family support, not yet under HAART, those who perceive themselves as ill.
Abdul Basit
National College of Business Administration and Economics (NCBA&E)
Pakistan
Title: Entropy of size-biased moment exponential distribution
Time : 11:45-11:55
Biography:
Abdul Basit has completed his MS in Social Sciences at the age of 31 years from SAZABIST Karachi, Pakistan. Currently he is the PhD research scholar in the discipline of Statistics in National College of Business Administration & Economics Lahore, Pakistan. He is Assistant Director of Statistics & DWH Department of State Bank of Pakistan. He has published 04 research papers in the journals and many articles were presented in national and international conferences.
Abstract:
In this article we consider the entropies for some life time distributions and compare them for exponential distribution and size-biased moment exponential distribution. For this purpose, for each, a mathematical expression of entropy has been derived. We consider the two different distributions rather than the truncated distribution as studied in the literature. A new estimator of entropy has also been introduced and derived its properties. We calculate relative loss of entropy for size-biased moment exponential distribution. Empirical study and graphical presentation has been conducted to illustrate which entropy measure has advantages in terms of relative loss in entropy.
Closing Ceremony
Nnadozie C Dandy
Institute of Management & Technology
Nigeria
Title: Encyclopedia of mechanical engineering and CSA technology research database for the study of engineering
Time : 12:40-12:50
Biography:
Nnadozie C Dandy has completed HND in Mechanical Engineering from Institute of Management & Technology (I M T), Enugu, Nigeria. He also attends Regional Maritime University, Accra, Ghana, where he got his Marine Mandatory as a Marine Engineer. He is an American Sign Language Instructor. He is also a Member of Special Heart Royal Foundation, a non-governmental organization in Rivers State, Nigeria.
Abstract:
CSA Technology Research Database is bibliographic platform that is updated monthly with temporal coverage from 1962 to the present. It combines a number of secondary databases: the Materials Research Database with METADEX, the CSA High Technology Research Database with Aerospace, and the CSA Engineering Research Database. Mechanical Engineering Abstracts is a continuation of the formerly named \"ISMEC Bulletin\" (v.1, no.1, July 10, 1973) (issn: 0306-0039), which appears to have ceased under this title in December, 1987 (v. 20, no.6). ISMEC Bulletin was published by Cambridge Scientific Abstracts. Now, Mechanical Engineering Abstracts is also known as \"ISMEC, Mechanical Engineering Abstracts\". Another title is \"Information service in Mechanical Engineering Bulletin. The oldest record in the database has a publication date of 1939. However, about 50% of its records have publication dates of 1986 or later. METADEX is updated once a month. Approximately 45,000 new records are added per year. As of June 2010, this database contained more than 7,058,162 records. Temporal coverage for Mechanical Engineering Abstracts is from 1981 to 2002. Current information is located in Mechanical & Transportation Engineering. As of May 2010, this database contained more than 6,000 methods and it will be updated regularly, further subject coverage includes all aspects of Mechanical engineering. Other subjects, which are covered in this database, are Aerospace engineering, Automotive engineering, Naval architecture and Marine engineering, Railroad engineering, and Materials handling. Nuclear technology is also part of this database covering: Fluid flow, Hydraulics, Pneumatics, and Vacuum Technology. Heat and Thermodynamics covers Industrial furnaces, process heating, space heating, air conditioning, refrigeration, an cryogenics.
Cosmas Chidiebere Nsirim
University of Canberra
Australia
Title: Big data information system: Challenges and analysis
Time : 12:50-13:00
Biography:
Abstract:
Books have been written, research is constantly going on concerning big data. This has created prospects for researchers to achieve high significance in information system. With the materialization of new data collection technologies, highly developed data mining and analytics support, there have been a lot of changes that are occurring with the research methodology we apply. Given these promises, there are still many unanswered questions and research challenges that need to be investigated at greater length. The contexts include political discourse, digital journalism, social networks and blogs, financial services, corporate announcements, mobile telephony, home entertainment, online gaming, online shopping, social advertising, and social commerce. The ongoing progress and implementation of big data will, at the end of the day, provide clarity on whether big data is a fad or if it represents substantial progress in information systems research. Three theses also show how future technological developments can be used to advance the discipline of information systems. Technological progress should be used for a cumulative supplement of existing models, tools, and methods. By contrast, scientific revolutions are independent of technological progress.
Tahira Ashraf
Hajvery University Lahore
Pakistan
Title: A study of domestic violence on females and its determinants in Lahore, Pakistan
Biography:
Tahira is an M.Phil scholar (Biostatistics) and one of the activists in medical research, here in Lahore, Pakistan. She secured top position with 3.96/4 CGPA from University of the Punjab Lahore in M.Sc Biostatistics. She is currently serving as a research associate in one of the leading research organizations. She has secured more than 10 scientific publications and presently working on self financed project on domestic violence on women.
Abstract:
The objective of this study was to highlight determinants of domestic violence and its consequences in metropolitan City of Lahore, Pakistan. This cross sectional survey is ongoing and till “final presentation in November 2015†we will comprise the results of at least 500 victims. On available data we analyzed that there were 60% married and rest of 40% were unmarried and divorced females. Their education level was very low and none of them were graduated. According to their socio-economic status, 35% were living below poverty and the rest of the 65% were of lower to middle class. Minor to major injuries were seen in 75%, 20% had fractures of their bones and the rest of the 5% had blunt traumas. Among assaulters, 55% were husbands, 20% were brothers, 10% were sisters and 15% were fathers-in-law or mothers-in-law. A total of 50% of assaulters were drug abusers and 35% used weapons for violence. According to their current physical conditions, 15% subjects were critical and were admitted to intensive care unit.
Redeat Belaineh
National Animal Health Diagnostic and Investigation Center (NAHDIC)
Ethiopia
Title: Characterization of Newcastle disease virus and poultry-handling practices in live poultry markets, Ethiopia
Biography:
Redeat Belaineh has acquired a Doctor of Veterinary Medicine in Ukraine and her Master’s degree in Microbiology from Addis Ababa University. Currently, she is an Employee of the Ministry of Agriculture working for the National Animal Health Diagnostic and Investigation Center (NAHDIC). She is Head of Molecular Biology Laboratory. This center is currently implementing ISO17.025 international quality management system in its laboratories and has a bio-safety level 3 facilities which has been approved as east Africa’s supportive laboratory for diagnosis of trans-boundary animal diseases.
Abstract:
Newcastle disease represents the most severe poultry disease responsible for marked economic losses in Ethiopia. To provide a molecular characterization of Newcastle disease viruses circulating in our country, and classified, our phylogenetic analysis of the 260 fragment of the fusion gene of all the 29 sequenced isolates. A cross sectional survey was conducted at five selected live poultry market sites in Addis Ababa. In addition, baseline data on the live poultry market system were acquired through a detailed questionnaire submitted to poultry traders. We identified 44/146 positive samples, 65.9% of which was virulent strains belonging to sub-genotype VIf. The very poor biosecurity practices, which have resulted from responses of the participants, suggest that they might have had a heavy impact in the spread of the disease. This study provides important information on the epidemiology and control of NDV in Ethiopia and highlights the importance of implementing surveillance and biosecurity practices in live poultry markets.