

第56回研究会 案内


 ◇テーマ  ビジュアルデータマイニングと統計計算
         (Visual Data Mining and Statistical Computing)

 ◇日 時  2014年5月31日(土) 14:00 〜 17:30

 ◇場 所  岡山理科大学 50周年記念館3階 会議室






コーディネーター:黒田正博・森 裕一(岡山理科大学) 

ArcView や XGobi の開発にかかわられている米ユタ大学のJürgen Symanzik 教授が韓国全州大学をサバティカルで訪問している機会を利用し,岡山理科大学にお招きし,「ビジュアルデータマイニングと統計計算」のテーマで研究会をもちます。みなさん,ふるってご参加ください。なお,講演は英語で行われます。

  森 裕一(岡山理科大学)


14:05   坂本 亘(岡山大学大学院環境生命科学研究科)

"Information criteria for linear mixed effect models: bias correction based on a Monte Carlo method"

In selecting random effect terms and estimating covariance structure in linear mixed effect (LME) models, the region of parameters is constrained to a subset of the Euclidean space, and a model with less random effect terms is located on some boundary of the region. Hence, if a fitted model has more random effect terms than a true model, the AIC overestimate the bias in estimating an expected log-likelihood by the maximum loglikelihood.
In this talk, bias-corrected information criteria for LME models are proposed, which were based on a Monte Carlo approximation, and their performance is compared with existing criteria by using example data and in a simulation study. We found that they gave less bias than AIC when a larger model was fitted, and that they decreased the possibility of choosing a smaller model wrongly.

14:35   森 裕一・黒田正博(岡山理科大学)・飯塚誠也(岡山大学)・榊原道夫(岡山理科大学)

"Performance of acceleration of ALS algorithm in nonlinear PCA "

Nonlinear principal components analysis with optimal scaling (NLPCA-OS) is useful for analyzing mixed measurement level data. The algorithm in NLPCA-OS is based on the alternating least squares (ALS) algorithm, where optimal transformation and low-rank matrix approximation are alternated until convergence.
We have proposed an accelerated ALS algorithm using the vector epsilon algorithm (ve-ALS) which increases the speed of convergence, and have observed that computational costs by ve-ALS are less expensive than those by ordinary ALS in small examples in which all variables are categorical.
In this paper, we try to evaluate the performance of proposed ve-ALS by simulation, in which NLPCA with ve-ALS is applied to several simulated datasets which have large numbers of variables with a variety of mixing rates of numerical and categorical variables. The simulation study indicates that the performance of approximation by ve-ALS is improved for all simulated datasets and that the larger the number of categorical variables is and the higher the mixing rate is, the more the ve-ALS reduces the computational costs.

15:25   Jürgen Symanzik (Utah State University, Department of Mathematics and Statistics)

Presentation #1 "Visual Data Mining as a Tool for Educational Data"

Visual data mining (VDM) enables data analysts in all fields to carry out visual investigations leading to insights into relationships in complex data. In this first presentation, we will first discuss the general concept of VDM.
We will then discuss two case studies that demonstrate how a variety of graphical methods can be used to extract interesting information from the underlying data sets. In example (i), we look at the visualization of "states" from an online educational game. In example (ii), we look at the learning progressions in an iPad study of young children.

16:10   Jürgen Symanzik (Utah State University, Department of Mathematics and Statistics)

Presentation #2 "Visual Data Mining via Linked Micromap Plots in R"

Linked micromap (LM) plots are a graphical representation that link spatial information and multiple statistical variables via a series of small maps that highlight the statistical data in accompanying plots.
In 2008, Symanzik and Carr indicated that no implementation of LM plots in R existed at that time. This considerably changed during the last five years. First, multiple R scripts in support of Carr and Pickle (2010) have been developed and made available. In 2012 and 2013, two full R packages, "micromap" and "micromapST", have been released. Moreover, additional micromap examples have been implemented in R by various authors. Most recently, R code has been developed that allows to adapt shapefiles that are available in many Geographic Information Systems to be used as the basis for LM plots in R.
We will discuss a variety of examples of regional LM plots, ranging from the United States to numerous countries from South America, and some recent examples from South Korea and China.






