View Abstract
Hide Abstract
| Purchase PDF

**Abstract**

We use a method of parameterized sub-family of probability distributions to connect empirical likelihood (EL) with parametric likelihoods and discuss the EL inference in the framework of parametric likelihood inference. The EL inference benefits from theoretical developments in the parametric case. We illustrate the method with general estimating equations and consider M-type linear regression as an exam- ple of practical applications where the proposed method promotes conditional EL inference with parameter orthogonality in place of profile EL inference.

We use a method of parameterized sub-family of probability distributions to connect empirical likelihood (EL) with parametric likelihoods and discuss the EL inference in the framework of parametric likelihood inference. The EL inference benefits from theoretical developments in the parametric case. We illustrate the method with general estimating equations and consider M-type linear regression as an exam- ple of practical applications where the proposed method promotes conditional EL inference with parameter orthogonality in place of profile EL inference.

Author(s): Mi-Ok Kim

View Abstract
Hide Abstract
| Purchase PDF

**Abstract**

Support vector machine (SVM) and other reproducing kernel Hilbert space (RKHS) based classifier systems are drawing much attention recently due to its robustness and generalization capability. All of these approaches construct classifier based on training sample in a high dimensional space by using all available dimensions. SVM achieves huge data compression by selecting only few observations lying in the boundary of the classifier function. However when the number of observations is not very large (small n) but the number of dimensions are very large (large p) then it is not necessary that all available dimensions are carrying equal information in the classification context. Selection of only useful fraction of available dimensions will result in huge data compression. In this paper we have come up with an algorithmic approach by means of which such an optimal set of dimensions could be selected. We have reversed and modified the solution proposed by Zhu and Hastie in the context of Import Vector Machine (IVM), to select an optimal sub model by using only few observations. For large p small n domain (e.g. Bioinformatics) our method compares different trans-dimensional model to come up with optimal set of dimensions to build the final classifier.

Support vector machine (SVM) and other reproducing kernel Hilbert space (RKHS) based classifier systems are drawing much attention recently due to its robustness and generalization capability. All of these approaches construct classifier based on training sample in a high dimensional space by using all available dimensions. SVM achieves huge data compression by selecting only few observations lying in the boundary of the classifier function. However when the number of observations is not very large (small n) but the number of dimensions are very large (large p) then it is not necessary that all available dimensions are carrying equal information in the classification context. Selection of only useful fraction of available dimensions will result in huge data compression. In this paper we have come up with an algorithmic approach by means of which such an optimal set of dimensions could be selected. We have reversed and modified the solution proposed by Zhu and Hastie in the context of Import Vector Machine (IVM), to select an optimal sub model by using only few observations. For large p small n domain (e.g. Bioinformatics) our method compares different trans-dimensional model to come up with optimal set of dimensions to build the final classifier.

Author(s): Dipak K. Dey, Samiran Ghosh, Yazhen Wang

View Abstract
Hide Abstract
| Purchase PDF

**Abstract**

Frequently the classifiers are developed using class-imbalanced data, i.e., data sets where the number of subjects in each class is not equal. Standard classification methods used on class-imbalanced data produce classifiers that do not accurately predict the smaller class. We previously showed that additional challenges exist when the data are both class-imbalanced and high dimensional, i.e., when the number of samples is smaller than the number of measured variables. Most classification methods base the classification rule on a numerical variable that is produced by the classification algorithm, for example on the probability for a sample to belong to a class (as for penalized logistic regression), on the proportion of samples among the nearest neighbors that belong to a class (for k-NN), on the proportion of bootstrap trees that classified the sample in a given class (for random forests). If the value of this numerical variable is above a pre-specified threshold, then the new sample is classified in a given class. We evaluated if we could improve the performance on imbalanced data of some classifiers by estimating the threshold value upon which their classification rule is based. We addressed the issue on how choose the threshold value. We estimated the threshold (on a training set) maximizing the Youdens index (sensitivity + specificity-1), the positive or the negative predictive value, or their sum. The results obtained on independent test sets were evaluated both in terms of class-specific predictive accuracies and of class-specific predictive values, and we compared the empirically determined thresholds with the thresholds commonly used in practice. In this talk we will show the simulation-based results obtained using penalized logistic regression models.

Frequently the classifiers are developed using class-imbalanced data, i.e., data sets where the number of subjects in each class is not equal. Standard classification methods used on class-imbalanced data produce classifiers that do not accurately predict the smaller class. We previously showed that additional challenges exist when the data are both class-imbalanced and high dimensional, i.e., when the number of samples is smaller than the number of measured variables. Most classification methods base the classification rule on a numerical variable that is produced by the classification algorithm, for example on the probability for a sample to belong to a class (as for penalized logistic regression), on the proportion of samples among the nearest neighbors that belong to a class (for k-NN), on the proportion of bootstrap trees that classified the sample in a given class (for random forests). If the value of this numerical variable is above a pre-specified threshold, then the new sample is classified in a given class. We evaluated if we could improve the performance on imbalanced data of some classifiers by estimating the threshold value upon which their classification rule is based. We addressed the issue on how choose the threshold value. We estimated the threshold (on a training set) maximizing the Youdens index (sensitivity + specificity-1), the positive or the negative predictive value, or their sum. The results obtained on independent test sets were evaluated both in terms of class-specific predictive accuracies and of class-specific predictive values, and we compared the empirically determined thresholds with the thresholds commonly used in practice. In this talk we will show the simulation-based results obtained using penalized logistic regression models.

Author(s): Lara Lusa, Rok Blagus

View Abstract
Hide Abstract
| Purchase PDF

**Abstract**

Distance/metrics play key role at many places in mathematics. Most mathematical studies in coding theory use Hamming distance. This greatly ignores consideration of actual error patters alone and covers capability for correction of much more than what is actually required in a given situation, resulting in greater redundancy and drop in communication rate. Some studies have used Lee-distance, but with Hamming & Lee distances, one is constrained to use metrics that are not specially suitable in a given situation. In this paper we present a very general class of metrics, introduced by Sharma and Kaushik, which include Hamming and Lee distances as special extreme cases. The paper presents several results on bounds on parity-check digits and construction of perfect codes for generalized class of metrics.

Distance/metrics play key role at many places in mathematics. Most mathematical studies in coding theory use Hamming distance. This greatly ignores consideration of actual error patters alone and covers capability for correction of much more than what is actually required in a given situation, resulting in greater redundancy and drop in communication rate. Some studies have used Lee-distance, but with Hamming & Lee distances, one is constrained to use metrics that are not specially suitable in a given situation. In this paper we present a very general class of metrics, introduced by Sharma and Kaushik, which include Hamming and Lee distances as special extreme cases. The paper presents several results on bounds on parity-check digits and construction of perfect codes for generalized class of metrics.

Author(s): Bhu Dev Sharma

View Abstract
Hide Abstract
| Purchase PDF

**Abstract**

A new form of fuzzy topology on function spaces, called starplus nearly compact pseudo regular open fuzzy topology is introduced. It is observed that such fuzzy topology is finer than pointwise fuzzy topology and weaker than a new fuzzy topology that is pseudo ?-admissible on starplus near compacta. A sufficient condition is also obtained under which starplus nearly compact pseudo regular open fuzzy topology coincides with pseudo ?-admissible fuzzy topology on starplus near compacta.

A new form of fuzzy topology on function spaces, called starplus nearly compact pseudo regular open fuzzy topology is introduced. It is observed that such fuzzy topology is finer than pointwise fuzzy topology and weaker than a new fuzzy topology that is pseudo ?-admissible on starplus near compacta. A sufficient condition is also obtained under which starplus nearly compact pseudo regular open fuzzy topology coincides with pseudo ?-admissible fuzzy topology on starplus near compacta.

Author(s): Atasi Deb Ray,Pankaj Chettri

View Abstract
Hide Abstract
| Purchase PDF

**Abstract**

Sufficient dimension reduction (SDR) has proven effective to transform high dimensional problems to low dimensional projections, while losing no regression information and pre-specifying no parametric model during the phase of dimension reduction. However, existing SDR methods suffer from the fact that each dimension reduction component is a linear combination of all the original predictors, and thus can not perform variable selection. In this talk, we propose a regularized SDR estimation strategy, which is capable of simultaneous dimension reduction and variable selection. We demonstrate that the new estimator achieves consistency in variable selection without requiring any traditional model, meanwhile retaining n1=2 estimation consistency of the dimension reduction basis. Both simulation studies and real data analyses are reported.

Sufficient dimension reduction (SDR) has proven effective to transform high dimensional problems to low dimensional projections, while losing no regression information and pre-specifying no parametric model during the phase of dimension reduction. However, existing SDR methods suffer from the fact that each dimension reduction component is a linear combination of all the original predictors, and thus can not perform variable selection. In this talk, we propose a regularized SDR estimation strategy, which is capable of simultaneous dimension reduction and variable selection. We demonstrate that the new estimator achieves consistency in variable selection without requiring any traditional model, meanwhile retaining n1=2 estimation consistency of the dimension reduction basis. Both simulation studies and real data analyses are reported.

Author(s): Lexin Li and Howard D. Bondell

View Abstract
Hide Abstract
| Purchase PDF

**Abstract**

Regression of a scalar response on functional predictors (or signals), such as spectra or images, presents a major challenge when, as is typically the case, the dimension of the signals far exceeds the number of signals in the dataset. Fitting such a model meaningfully requires some form of dimension reduction. A proposed approach to this problem extends common multivariate methods (principal component regression (PCR) and partial least squares (PLS)) to handle functional data by also incorporating a roughness penalty. A number of alternative estimation strategies are available and these will be discussed briefy, as well as suficient conditions for consistency. These methods are illustrated using data from near infrared (NIR) spectra from chemical samples and data from a brain imaging study.

Regression of a scalar response on functional predictors (or signals), such as spectra or images, presents a major challenge when, as is typically the case, the dimension of the signals far exceeds the number of signals in the dataset. Fitting such a model meaningfully requires some form of dimension reduction. A proposed approach to this problem extends common multivariate methods (principal component regression (PCR) and partial least squares (PLS)) to handle functional data by also incorporating a roughness penalty. A number of alternative estimation strategies are available and these will be discussed briefy, as well as suficient conditions for consistency. These methods are illustrated using data from near infrared (NIR) spectra from chemical samples and data from a brain imaging study.

Author(s): Todd Ogden

View Abstract
Hide Abstract
| Purchase PDF

**Abstract**

In this paper, estimation of Weibull distribution shape and scale parameters is accomplished through use of symmetri- cally located percentiles from a sample. The process requires algebraic solution of two equations derived from the cumulative distribution function. Bayesian prediction limits for the future observations from two parameters Weibull distribution are obtained in the presence of outliers of type and with random sample size. Numerical examples are used to illustrate the procedure.

In this paper, estimation of Weibull distribution shape and scale parameters is accomplished through use of symmetri- cally located percentiles from a sample. The process requires algebraic solution of two equations derived from the cumulative distribution function. Bayesian prediction limits for the future observations from two parameters Weibull distribution are obtained in the presence of outliers of type and with random sample size. Numerical examples are used to illustrate the procedure.

Author(s): A. H. Abd Ellah

View Abstract
Hide Abstract
| Purchase PDF

**Abstract**

Radial basis functions are widely used for image reconstruction because of grid flexibility and the fast convergence of their approximations; in other words, the RBF methods are mesh-free and high order convergent. We will present the 2D image reconstruction based on radial basis function reconstruction for sharp edged images using a MATLAB toolbox as a GUI interface. Sharp edged images reconstructed using RBF interpolations, however, tend to yield Gibbs ringing effects on the images themselves.To minimize Gibbs oscillations, the epsilon-adaptive method is employed in the developed GUI toolbox with which the shape parameter is adaptively chosen such that only the first order basis function is used near the neighborhood. To adopt this method, the functions called from the toolbox first detect the discontinuities in the image. The expansion coefficients and the concentration function derived from the first derivatives are used for the generation of an edge map. This edge map is defined where the product of the expansion coefficients and the concentration functions yield high values. We also use a domain splitting technique to develop the fast reconstruction algorithm. This technique is also used for the local image reconstruction near the discontinuity, which will be of the most interest to the user. We will present the RGB color image demonstrations using the toolbox. The non-uniform distribution in the reconstructed grid space and the Fourier filtering technique embedded in the toolbox will also be discussed.

Radial basis functions are widely used for image reconstruction because of grid flexibility and the fast convergence of their approximations; in other words, the RBF methods are mesh-free and high order convergent. We will present the 2D image reconstruction based on radial basis function reconstruction for sharp edged images using a MATLAB toolbox as a GUI interface. Sharp edged images reconstructed using RBF interpolations, however, tend to yield Gibbs ringing effects on the images themselves.To minimize Gibbs oscillations, the epsilon-adaptive method is employed in the developed GUI toolbox with which the shape parameter is adaptively chosen such that only the first order basis function is used near the neighborhood. To adopt this method, the functions called from the toolbox first detect the discontinuities in the image. The expansion coefficients and the concentration function derived from the first derivatives are used for the generation of an edge map. This edge map is defined where the product of the expansion coefficients and the concentration functions yield high values. We also use a domain splitting technique to develop the fast reconstruction algorithm. This technique is also used for the local image reconstruction near the discontinuity, which will be of the most interest to the user. We will present the RGB color image demonstrations using the toolbox. The non-uniform distribution in the reconstructed grid space and the Fourier filtering technique embedded in the toolbox will also be discussed.

Author(s): Vincent Durante, Jae-Hun Jung

View Abstract
Hide Abstract
| Purchase PDF

**Abstract**

The sn factorial design, where s is a prime number like 2, 3, 5, etc., generally confounds a set of diferent orthogonal factorial efects with sr blocks (n > r > 1) under one replicate. Thus there will be a complete set of orthogonal replicates of different group of orthogonal factorial efects for constructing a partially confounded sn factorial design in sr blocks. The factorial design, in general, when it partially confounds different orthogonal sets of diferent groups of orthogonal factorial efects under the complete set of different orthogonal replicates constitutes a BIB design having their individual properties, same result and detailed information of all factorial efects. The procedure is illustrated with some examples and one practical application.

The sn factorial design, where s is a prime number like 2, 3, 5, etc., generally confounds a set of diferent orthogonal factorial efects with sr blocks (n > r > 1) under one replicate. Thus there will be a complete set of orthogonal replicates of different group of orthogonal factorial efects for constructing a partially confounded sn factorial design in sr blocks. The factorial design, in general, when it partially confounds different orthogonal sets of diferent groups of orthogonal factorial efects under the complete set of different orthogonal replicates constitutes a BIB design having their individual properties, same result and detailed information of all factorial efects. The procedure is illustrated with some examples and one practical application.

Author(s): M. Shamsuddin, M. Albassam