## 121-2 Using Nonparametric Nearest Neighbor Approach to Estimate Sum of Bases.

## Poster Number 1010

See more from this Division: S02 Soil ChemistrySee more from this Session: General Soil Chemistry

Monday, October 22, 2012

Duke Energy Convention Center, Exhibit Hall AB, Level 1

Sum-of-bases is important for soil classification and for certain evaluations of soil nutrient availability. In Soil Survey, sum-of-bases are needed for every horizon of every soil (component) within a map unit. The objective is to develop and validate a model for estimating sum-of-bases using the k-nearest-neighbor (k-NN) approach, and using readily available properties within the National Soil Information System (NASIS). The model inputs one reference data set derived from the Kellogg National Soil Survey Laboratory characterization database. The reference dataset contains soil layers with measured values for sum-of-bases, soil pH, OC, extractable acidity and CEC or ECEC; and taxonomic order (or soil order groups), family mineralogy class, and horizon name (e.g., Ap) obtained from pedon classifications. The model searches the "reference" data set for soils that are most similar to the target soil, based on a set of attributes (OC, pH, extractable acidity, and CEC/ECEC). The set of attributes are searched within the same soil order, family mineralogy class, and master horizon name as the target soil. Other conditions are met for soils containing gypsum and andic soil properties. The "distance" (a measure of similarity) of each soil to the target soil (in the reference data set) is calculated. The attribute values are normalized before they are used to calculate "distance". As a result, temporary variables are generated with a distribution having zero mean and standard deviation of one. The closest 10 soils (k) in the reference data set are then used to formulate the estimate of the output sum-of-bases. In formulating the final sum-of-base estimate, a weighting system is employed that accounts for the distribution of the 10 closest distances to the target soil. Model validation was conducted using an independent dataset of 4,369 soil layers from pedons sampled throughout the US. The model explained 94.5% of the variation in the sum-of-base values with a RMSE of 2.805 cmol

See more from this Division: S02 Soil Chemistry_{(+)}kg^{-1}. The sum of residuals (-0.036 cmol_{(+)}kg^{-1}) indicated a very small bias. The k-NN approach proved useful in predicting sum-of-bases for use in soil survey when measured data is not available.See more from this Session: General Soil Chemistry