Archive of "Mathematics and Computer Sciences Journal (MCSJ)"
Volume 2, Issue 9
Sep 2017

Hierarchical Clustering of Population Pyramids Presented as Histogram Symbolic Data

Mathematics and Computer Sciences Journal (MCSJ), Volume 2, Sep 2017

View Abstract   Hide Abstract   |   Purchase PDF
Abstract
Population pyramid is a very popular presentation of the age-sex distribution of the human population of a particular region. Its shape is influenced not only by demographical indicators, but also by many other social and political characteristics, such as birth control policy, wars, life-style etc. In the paper Clustering of population pyramids (Korenjak-C erne, Kejzar, Batagelj, Informatica, 2008) clusters of world countries with similar pyramidal shapes were obtained using Wards hierarchical clustering. The corresponding clusters shapes can offer additional insight about countries to field-related researchers. In order to get clusters where the gender and size of population are also taken into account we present data as histogram symbolic data (Billard, Diday, 2006). For their analysis we adapt the generalized Wards hierarchical clustering procedure (Batagelj, 1988). The changes of the pyramids shapes, and also changes of the countries inside main clusters will be examined for the years 1996, 2001, and 2006.

Author(s): Natasa Kejzar, Simona Korenjak-Cerne, Vladimir Batagelj

Statistical Modulation of a Human Health Problem in Albania

Mathematics and Computer Sciences Journal (MCSJ), Volume 2, Sep 2017

View Abstract   Hide Abstract   |   Purchase PDF
Abstract
The air pollution from the industry activity is very dangerous for the human health. This paper aims to analyze the data collected in three sites: two polluted and the ones not, positioned in the south of Albania. Using ANOVA we analyze the influence of the site in the hematological and pneumological field. We build a multivariable regress model for the pneumology using smoke, time stay and the age as independent variables in this model. The covariance method used on the model shows that avoiding the smoke variable there is no difference between three sites in the pneumological field. The dependence of the smoke from the time stay is shown using the multi ANOVA method.

Author(s): Luela Prifti, Etleva Beliu, Shpetim Shehu

Applications of Wavelet-Based Functional Mixed Models to Proteomics and Genomics Data

Mathematics and Computer Sciences Journal (MCSJ), Volume 2, Sep 2017

View Abstract   Hide Abstract   |   Purchase PDF
Abstract
Various genomic and proteomic assays yield high dimensional, irregular functional data. For ex- ample, MALDI-MS yields proteomics data consisting of one-dimensional spectra with many peaks, 2D gel electrophoresis and LC-MS yield two-dimensional images with spots that correspond to peptides present in the sample, and array CGH or SNP chip arrays yield one-dimensional functions of copy number information along the genome. In this talk, I will discuss how to identify candidate biomarkers for various types of proteomic and genomic data using Bayesian wavelet-based functional mixed models. This approach models the functions in their entirety, so avoid reliance on peak or spot detection methods. The ?exibility of this framework in modeling nonparametric fixed and random effect functions enables it to model the effects of multiple factors simultaneously, allowing one to perform inference on multiple factors of interest using the same model fit, while adjusting for clinical for experimental covariates that may affect both the intensitiesand locations of the peaks and spots in the data. I will demonstrate how to identify regions of the functions that are differentially expressed across experimental conditions, in a way that takes both statistical and clinical significance into account and controls the Bayesian false discovery rate to a pre-specified level. Time allowing, I will also demonstrate how to use this framework as the basis for classifying future samples based on their proteomic smf genomic profiles in a way that can also combine information across multiple sources of data, including proteomic, genomic, and clinical. These methods will be applied to a series of proteomic and genomic data sets from cancer-related studies.

Author(s): Jeffrey S. Morris

A semiparametric Bayesian model for examiner agreement in periodontal research

Mathematics and Computer Sciences Journal (MCSJ), Volume 2, Sep 2017

View Abstract   Hide Abstract   |   Purchase PDF
Abstract
An important measure of the severity of periodontal disease is the probing pocket depth (PPD), which is measured on up to 6 sites for each tooth in the mouth. Establishing and monitoring agreement among multiple examiners is critical to high quality periodontal research. We develop a Bayesian hierarchical model that links the true, observed and recorded values of PPD, permitting correlation among the measures within patient. Tooth-site-specific examiner effects are modeled as arising from a Dirichlet process mixture, facilitating discovery of subgroups among the periodontal sites according to degree of agreement with a reference examiner. We analyze data from a PPD calibration study and illustrate the effects of correlation on assessments of examiner agreement.

Author(s): Elizabeth H. Slate and Elizabeth G. Hill

A Bayesian Approach to Inferring the Contribution of Unobserved Ground Conditions to Observed Scores in Sports: The Example of Cricket

Mathematics and Computer Sciences Journal (MCSJ), Volume 2, Sep 2017

View Abstract   Hide Abstract   |   Purchase PDF
Abstract
This paper is part of a wider research programme using a dynamic-programming approach to modelling the choices about the amount of risk to take by teams and players in International Cricket. An important confounding variable in this analysis is the ground conditions (size and shape of stadium, condition of playing surface and weather conditions) that affect the trade off between risk and return that teams and players face. This variable does not exist in our historical data set and would in any event be very difficult to accurately observe on the day of a match. In this paper, we consider a way of estimating a distribution for the ground conditions using only the information contained in the scores and result of the match. In our approach we use the difference between the cumulative density function of scores and a probit estimate of the probability of each score being a winning score in order to infer the extent to which high scores on average reflect easy conditions rather than good performance. Using a Monte Carlo method we estimate the percentage of the variation in total scores that is due to the variation in conditions and we subsequently use Bayes Law to estimate a distribution of conditions for each match. We develop our method using the example of cricket and we outline some potential applications of the method to other sporting contests.

Author(s): Scott R. Brooker, Seamus Hogan