Journal Name:
- European Journal of Pure and Applied Mathematics
Key Words:
Author Name | University of Author |
---|---|
Abstract (2. Language):
A model building strategy is proposed to improve the probabilistic match in record linkage
with focus on the loglinear mixture model of two components, each for the matched and unmatched
pairs respectively. In reality, comparison attributes (i.e., covariates) often interact with each other,
leading to more or less interactions in the loglinear models for both the matched and unmatched pairs.
However, the interactions patterns are often not the same for both components. Particularly, because
the number of matched pairs is usually very small compared with that of unmatched pairs in practice,
the model for matched pairs can not be fitted with the same higher order interactions as that for
the unmatched pairs. The proposed strategy is data-driven, and attempts to avoid both underfitting
and overfitting due to subjective model specification for the data. Starting from the situation of no
interaction, we add interactions sequentially in two loglinear components using the forward selection
approach. Specifically, we define the alternatively climbing pathways through mixture families of two
components with higher order interactions. The mixture models expanded along a pathway are nested
successively. Thus, conventional tests used for comparison of nested models can be applied. Regarding
parameter estimation for the mixture, a simplified method (including the choice of initial values
of parameters) for the EM algorithm is developed, which facilitates the mixture model fitting using
existing packages and functions in sophisticated statistical software like R. Simulation studies have
then been conducted for various situations to assess the model selection approach, and comparisons
of the selected models with the naive model assuming field independence have been made. We have
applied this strategy to the record linkage case study in 2006 Annual Meeting of Statistical Society of
Canada (SSC) and identified interactions among certain comparison attributes for both matched and
unmatched pairs; these interactions are not always the same for both mixture components.
Bookmark/Search this post with
- 2
141-162