3 Establishing Soil Background
3.1 Introduction
This section includes information on establishing both default soil background and site-specific soil background. Default background studies are typically conducted by regulatory agencies or owners of multiple project sites to address the need for understanding background across a broad area. In contrast, site-specific background is collected specific to a particular cleanup site. When background concentrations are greater than risk-based concentrations, a comparison of site and background concentrations may be used to make decisions concerning appropriate remedial actions, including evaluating potential risks that can be reduced or controlled by remedial actions and what risks will likely remain due to soil background concentrations.
Section 9, Section 10 and Section 11 provide descriptions of sampling, laboratory analytical methods and statistical methods that are relevant to establishing soil background. Please reference Framework 1 and Framework 2, which are provided to depict the process generally used to establish default soil background and site-specific soil background, respectively. Other items important when establishing soil background and using it in risk assessment are a conceptual site model (CSM) and data quality objectives (DQOs) (Section 8). This section is intended to highlight key considerations specific to establishing default soil background and site-specific background, and references details in these other sections when appropriate.
3.1.1 Default soil background
Default soil background is generally established by regulatory agencies for a larger area (for example, state, urban region, or unique geologic zone) that shares similar physical, chemical, geological, and biological characteristics. Regulatory agencies use default soil background as a screening tool to determine whether contaminant concentrations at an individual site are generally within the background concentrations of the larger area. A single value (BTV) is often used to represent soil background since this is a simple way to screen sites, although soil background is more properly described by a range of values.
Many states have default soil background values relevant to the entire state or different regions of the state that can be compared to cleanup site concentrations to determine whether site contaminant concentrations are consistent with background conditions. Most regulatory agencies do not require remedial action for contaminants consistent with appropriate background concentrations (that is, site concentrations are at or below background concentrations). For this document, default soil background will be described as a single value, which is consistent with findings from our state survey and investigation of regulatory guidance (Section 12 and Section 13). Since default soil background values will be used to evaluate a wide range of sites, it is typically established using conservative assumptions or statistical parameters. Default soil background can be established for both natural and anthropogenic ambient soil background concentrations.
Conducting a default soil background study to derive default background values tailored to the information needs is optimal, but not always feasible since this requires significant time and resources. It may be appropriate to use an existing soil background study to establish default soil background if the existing study design and data objectives meet the informational needs of the background study. As reflected in the States Survey (Section 12 and Section 13), not every jurisdiction allows use of anthropogenic background to evaluate site conditions. Please reference Framework 1, which depicts the process generally used to establish default soil background.
3.1.2 Site-specific soil background
Site-specific soil background is generally established for an area of limited geographic scope that represents one specific project site (for example, an incinerator cleanup site, a railroad yard cleanup site). This is generally a more accurate way to evaluate whether site chemical concentrations are representative of background since it uses information relevant to a specific site in a limited geographical area. In many cases, site-specific soil background can be established for both natural and anthropogenic ambient soil background concentrations. As reflected in the States Survey (Section 12 and Section 13), not every jurisdiction allows use of anthropogenic background to evaluate site conditions. Please reference Framework 2, which depicts the process generally used to establish default soil background.
If the soil chemical concentrations at a site exceed default soil background values, most regulatory agencies allow responsible parties to complete a more refined evaluation to establish site-specific background. An area that has similar physical, chemical, geological, and biological characteristics as the cleanup site being evaluated, but has not been subjected to the same chemical releases as the cleanup site is used to represent site-specific background. The physical, chemical, geological, and biological characteristics of the site being evaluated and the soil background reference area used to establish site-specific background are generally more comprehensively characterized when establishing site-specific background.
Site-specific soil background may be established using a dataset from either:
- an existing soil background study that was conducted for another purpose if it has been evaluated to ensure it is appropriate to use
- a site-specific soil background study conducted specifically to establish the soil background for the site being evaluated
Once identified, a site-specific soil background dataset can be used in several ways, including to:
- establish a site-specific soil background threshold value (BTV)
- compare a site-specific soil background dataset to a site investigation dataset
The appropriate study design will depend on project goals and regulatory agency requirements. When conducting a site-specific background evaluation, it is common to both establish a BTV and compare the central tendencies of the background and site datasets. Establishing a site-specific soil BTV and comparing it to site contaminant concentrations can determine if the maximum site concentrations are within the range of soil background concentrations and can help identify potential localized contamination (hot spots) for further investigation. In contrast, comparing the central tendencies of a site-specific dataset to a site background dataset can determine if there may be slight but pervasive contamination. The two procedures are therefore complementary, as they test for the presence of different types of contamination, and they can be performed together. If a given chemical in a site dataset fails either test, then it can be examined further using geochemical evaluation or environmental forensics to confirm or rule out the actual presence of site-related contamination.
Conducting a site-specific soil background study to derive background values designed to achieve project goals is optimal, but not always feasible since this requires substantial time and resources and can have significant administrative hurdles (for example, site access). An alternative is to use an existing soil background study to establish default soil background if the existing study design and data objectives also meet project needs (Section 3.6).
3.2 Conducting a Soil Background Study
Conducting a background study for the purpose of establishing default or site-specific soil background is preferred to the use of an existing study since it allows the study to be designed to achieve project goals. It is important to perform adequate planning to ensure the collected data will address project goals and regulatory requirements. The following items, which are discussed more fully in this section, should be considered when designing a soil background study intended to determine default or site-specific soil background:
- What type of soil background is being obtained, natural and/or anthropogenic ambient?
- What are the definitions of natural and/or anthropogenic ambient soil background? These will impact what types of areas are included and excluded from sampling.
- Does the soil background reference area have sufficiently similar physical, chemical, geological, and biological characteristics to the cleanup site(s)?
- How is the obtained data intended to be used? Will it be compared to a large number of cleanup sites throughout a state or a more limited area such as a region, city, or county?
- Are sampling design and collection methods comparable? When possible, use the same methods to obtain the data for the cleanup sites that were used for the default soil background samples.
- Are laboratory sample preparation and analytical methods comparable? When possible, the same laboratory sample preparation and analytical methods should be used so the concentrations may be compared to one another. This may not always be possible due to logistic or contract laboratory constraints. If different methods are used, differences in results from those methods need to be considered before deciding whether it is appropriate to use the data.
3.2.1 Natural background
When establishing natural soil background, it is important to carefully consider (1) the intended purpose, (2) the applicable definition of natural background, and (3) which sources will and won’t be included. Natural background soil concentrations can differ depending on soil type and geologic location and origin ((Chen, Hoogeweg, and Harris 2001), (De Oliveira et al. 2014)). In most cases, samples to establish natural background will be obtained only from areas that have not been influenced by discrete/point source releases (for example, hazardous waste or petroleum releases) or diffuse/nonpoint sources (for example, smelter or lead gasoline emissions). Although there may be some cases where a regulatory agency’s definition of natural background may differ slightly from this, for purposes of this document we will use this definition, which is consistent with the definition in Section 2.
There may also be cases where a regulatory agency will allow samples from areas that are not natural background to be included with natural background since they are not from the site under evaluation. In this specific case the definition of background is changed to encompass a release not associated with the site being evaluated plus background. This is also the definition used by the USEPA Superfund program. For the purposes of this document, that would not be considered natural background or anthropogenic ambient soil background, rather it would be considered natural background plus anthropogenic ambient soil background, including point and nonpoint sources not released by the site being evaluated. Samples in proximity to these sources, such as another cleanup site release; stormwater runoff; lead from lead-based gasoline, smelters, or lead-based paint; or other direct or indirect local releases may be included as background samples if allowed by the regulatory agency with authority over the site but clearly should not be used for sites that are not impacted by them. A geochemical evaluation (Section 5) can assist in distinguishing between natural variability and low-level anthropogenic sources in a background dataset.
To ensure that a study appropriately represents the natural background of a selected area (regardless of size), ensure that selected sample locations are unlikely to have been impacted by human activities (Section 9.1). Soil background reference areas typically avoided include roadways, developed areas, industrial areas, and identified local anthropogenic releases. In some cases, it may be difficult to exclude all anthropogenic sources within the soil background reference area of interest. These sources may not be obvious when identifying sampling locations but become obvious when the data are analyzed. The normal heterogeneous nature of soil creates natural variability that may mask anthropogenic sources. The more specific and thorough the sampling criteria are developed to exclude anthropogenic inputs, the stronger the background dataset will be.
It is useful to establish minimum distances between sample sites and anthropogenic sources when developing the sampling plan. For example, in the U.S. Geological Survey (USGS) continental U.S. soil background study, the following distances from anthropogenic sources were used (Smith et al. 2013):
- more than 200 m (656 ft) from a major highway
- more than 50 m (164 ft) from a rural road
- more than 100 m (328 ft) from a building or structure
- more than 5 km (3.1 mi) downwind of active major industrial activities (for example, power plants or smelters)
Another USGS study to determine natural default soil background trace element concentrations in Wisconsin used the following criteria (Stensvold 2012):
- must be in a forested lot, permanent pasture, or otherwise undisturbed area at least 6 m (20 ft) away from a fence line.
- must not be within 1.6 km (1 mi) of any other study sample site.
- must not be within 8 km (5 mi) of any other sample from the same soil group.
- must not be within 30.5 m (100 ft) of existing known historical construction site or disturbed area (such as roads, dumps, pits, pipelines, or homesites).
- must not be within 91.4 m (300 ft) of a potential source of contamination (for example, past or present orchard or vegetable-growing area; cattle-dipping site; wood preservation activities; grasshopper bait; land that has had poultry or swine manure, sewage waste, or paper mill sludge applied to it; areas with known releases listed by the Wisconsin Department of Natural Resources Bureau for Remediation and Redevelopment Tracking System).
3.2.2 Anthropogenic ambient soil background
When establishing anthropogenic ambient soil background, it is important to carefully consider the intended purpose and clearly define anthropogenic ambient soil background to identify which sources should and should not be included. For purposes of this document, anthropogenic ambient soil background is defined in Section 2.2; however, the definition for anthropogenic ambient soil background varies more widely among regulatory agencies and other entities than that for natural soil background. In most cases, it is defined as including both natural background and diffuse sources of chemicals that can be transported long distances and are present in similar concentrations across a large area (for example, dioxins or PAHs). Local direct or indirect release sources such as those from a specific facility or a stormwater outfall are excluded, which is consistent with anthropogenic ambient soil background as defined in Section 2.
For example, when investigating lead, the areas near roadways may be excluded since the impacts of lead-based gasoline may not be uniform throughout the area. However, when investigating lead impacts from an air emission source, it may be necessary to understand anthropogenic background near roadways to discern the contribution from the air emission source compared to lead related to emissions on the roads.
USEPA Region 4 and Southeastern States Urban Background Study Example
USEPA Region 4 and southeastern states conducted a collaborative urban background soil study to document background concentrations of surface soils in an urban setting. As seen in the table below, the average lead concentrations in surface soils in the cities sampled varies from as low as 14 mg/kg to 213 mg/kg. Each city’s mean lead concentration is below USEPA’s current residential screening level (400 mg/kg). The variability of lead concentrations in the cities sampled represents the varying concentrations of lead that can be present in an urban setting. These data can aid in understanding when there may be contaminant releases versus anthropogenic ambient background. More information on the urban background study can be found at https://www.epa.gov/risk/regional-urban-background-study.
City | # Samples | Minimum (mg/kg) | Maximum (mg/kg) | Mean (mg/kg) | SD (mg/kg) |
Chattanooga, TN | 50 | 14 | 580 | 94.8 | 119.4 |
Columbia, SC | 50 | 1.7 | 200 | 39.9 | 37.9 |
Gainesville, FL | 50 | 2 | 110 | 14.5 | 18.2 |
Lexington, LY | 50 | 18 | 420 | 84.3 | 89.3 |
Louisville, KY | 50 | 25 | 1100 | 163.7 | 190.5 |
Memphis, TN | 50 | 13 | 1000 | 122.5 | 199.7 |
Raleigh, NC | 50 | 7.2 | 180 | 32.9 | 36.4 |
Winston-Salem, NC | 50 | 20 | 1400 | 213.8 | 241.2 |
Anthropogenic ambient soil background may also not include emissions from a current local source, such as a smelter or a refinery or areas near a stormwater conveyance. Historic sources may also need to be considered. In urban areas where industrial activity has taken place throughout the region with facilities coming and going over decades, exclusion of all local sources may be more difficult. Scenarios where other sources might be included in anthropogenic ambient soil background are discussed further herein.
It may not be the intent of the default background study to avoid all anthropogenic sources, but rather to obtain samples that reasonably represent conditions in an area or region. The objective might be to include all sources that have been released to the same area or region even if they would not be considered diffuse sources from long range transport. Areas where fill has been placed are generally not sampled to determine soil background. Historic records can guide site selection to avoid fill material and soil borings can be collected to confirm the presence of soil horizons that match those mapped in the United States Department of Agriculture (USDA) Natural Resources Conservation Service (NRCS) soil surveys. If an entire area has been filled or the landscape reshaped prior to development, it may be necessary to analyze soil boring samples collected to below the fill/disturbed layer to establish whether the fill material was contaminated.
PAH anthropogenic ambient background
Several state agencies have recognized the ubiquitous presence of PAHs from atmospheric deposition in various guidance documents and provided methodologies for sampling, analysis, and evaluation ((Cal DTSC 2009), (MADEP 2002)). Recently, a PAH study was performed in Milwaukee, Wisconsin, in collaboration with the Wisconsin Department of Natural Resources (Siemering and Thiboldeaux 2021), where samples were collected and analyzed from park areas demonstrated to be undisturbed and undeveloped for more than 80 years that had no evidence of fill material (confirmed by soil borings) and met the following criteria:
- 6 m (20 ft) from roadways and any parking lots,
- 6 m from other asphalt surfaces (bike/walk paths potentially coal tar sealant–coated),
- >1.5 m (5 ft) from any concrete sidewalks or any other structures,
- exclude athletic fields (if necessary, only along edges thereof), areas that suggest overland soil runoff/deposition from rainwater, and areas where street-cleared snow is piled.
At the remaining areas sampled, atmospheric deposition was considered the only potential source of PAHs, supported by the finding that maximum and median concentrations were vastly higher at the surface (0 to 7 cm) than at depth (15 to 30 cm), dropping to below detection limits for most PAHs. PAH concentrations showed no spatial gradients, and statistical analysis indicated the 3-6 ring PAHs were from the same diffuse local sources with the 2-ring compounds being transported in from outside the region.
The contaminants present in anthropogenic soil background samples can be the result of aerial deposition from diffuse sources (for example, PAHs, dioxins, furans), including smelters, power plants, past home heating with coal, home heating oil, backyard burn barrels, etc., as well as natural sources such as grass or forest fires. In cases such as this, comparisons of specific chemical/congener ratios and principal component analysis can be used as signatures to identify contaminant sources (Siemering and Thiboldeaux 2021) and allow for comparisons to the same ratios in site-specific data. There are also source-specific differences between urban and rural anthropogenic activities. Soils found in densely populated urban areas with long histories of industrial activity will have very different PAH and dioxin mixtures (regardless of concentration) than those found in less populated rural areas.
3.3 Choosing an Area for a Soil Background Study
The primary objective of selecting an area for soil background study is to find a background location that is free of chemical impacts from the site under investigation and has similar characteristics to the study area.
- A background reference area should be in the vicinity of the cleanup sites being evaluated but should exclude any cleanup sites and local releases.
- If there are regional anthropogenic ambient soil background sources, then they should affect the site and background reference areas similarly.
- Contaminant fate and transport pathways should be similar (for example, potential for runoff).
- Vegetation type should be similar (for example, forested vs. scrub-shrub).
Regardless of proximity, the background reference area and cleanup site should share as many physical, chemical, geological, and biological characteristics as possible. Site similarities are crucial to ensuring that the soil background established is relevant and conservative enough to be used to screen the cleanup sites that will be evaluated against it. Details regarding geologic and hydrologic conditions are presented herein. Section 9.1 provides a more detailed discussion of factors to consider when selecting a background reference area.
3.3.1 Geologic and geochemical considerations
When conducting background studies, it is important to ascertain that soil samples of similar characteristics and origin are being compared.
Geologic variability in soil parent materials plays a crucial role in the elemental composition of the soils in a soil background reference area (further discussed in Section 5). The larger a soil background reference area becomes, the more likely that soil types of varying chemical concentration will be encountered. Parameters that are indicative of soil geochemical composition include:
- lithology
- mineralogy
- soil type
- soil salinity
- cation exchange capacity (CEC)
- percent organic carbon
- soil density
- soil porosity
- soil pH and redox potential
Published soil and geologic surveys can typically provide sufficient information on site attributes. Test soil borings can be used to confirm sampling site uniformity of soil on site with the soil background reference area soils. Boring materials may also be tested to confirm soil geochemical attributes. Collection of this information during soil background sampling can aid in further analysis, such as identifying causes for minor differences between a site-specific background site and the study site, or further differentiating background from sites sampled to determine regional default background. Soil background samples should be collected from a designated soil background reference area that reflects the scope for the study. For example, if establishing default soil background for the state of Florida, obtain samples throughout Florida or a representative portion of the state to establish default soil background value. Site concentrations from samples in Florida can be compared to this default soil background value for Florida, particularly if they share the same geology.
If establishing soil background for areas with significantly different geological regions, consider whether it is necessary to establish different default soil background values for each distinct geological region or if regional data will be applied to the entire state. In the 2012 Wisconsin USGS (Stensvold 2012) study, the highest measured default background value for each of the 16 elements investigated was applied to the entire state to simplify regulatory application. Southwest Wisconsin is part of a historic lead and zinc mining district with higher Pb and Zn background values than the rest of the state (52 mg/kg Pb versus 20–30 mg/kg Pb) but creating a separate standard for this region and clearly delineating soils impacted by the ore body was determined to be unfeasible.
Geochemical evaluations employ field observations and analytical data, such as total metals concentrations in discrete soil samples. For example, high metal concentrations may be associated with a specific geologic area (for example, mineralized area) and much lower concentrations are associated with another geologic area. So, in addition to the geological areas being different, the measured concentrations in each area represent unique background concentrations. It is also possible that distinct geologic areas are sufficiently similar chemically. In this scenario, they may not present a unique chemical profile for most metals and may be considered in aggregate for developing background datasets. Geochemistry is discussed further in Section 5.
3.3.2 Hydrogeologic conditions
Saturated soils can affect the concentration of chemicals in soil when chemicals are soluble as well as create greater distinctions between dry weight and wet weight measurements. It is important to understand if saturated soils are present in the soil background reference area (for example, wetlands, creeks, very shallow groundwater) and to avoid these features if the goal of sampling is soil background concentrations. Sampling areas with these features to understand soil background specific to these conditions would be more appropriately a site-specific dataset (Section 3.8). Conditions such as a wetland or a creek should be mapped and called out in the sampling plan. Other conditions to monitor include precipitation and evapotranspiration.
3.4 Sampling
Before planning to sample, it is important to develop the context in which a dataset will be used. Default soil background data will generally be used to screen a large number of sites and will be broadly applied. It is designed to be more conservative than site-specific background data to ensure sites that are not truly background are not screened out. Also consider whether default soil background may be applicable to a rural, suburban, and/or urban area, as well as what caveats should be established to guide future use of the default data, such as applicable geologies. What caveats should be established to guide future use of the default data, such as applicable geologies?
For example, Ohio’s Environmental Protection Agency is developing background soil concentrations by county with the intention of using the background values in site assessments in those counties. Individual reports are published for each county that describe the process of collecting, analyzing, and publishing background values.
When choosing the most appropriate site-specific sampling design be sure to consider the project DQOs and how the soil background dataset will be used and what comparisons may be made. It is also important to ensure a sampling method is used that is comparable to the sampling method used for area default soil background determination. In a perfect scenario the same sampling method would be used for the default soil background study and the sites that are being screened using it. This is not always possible since sampling design and methods may differ from site to site for different reasons. A detailed discussion regarding sampling activities is included in Section 9. Key factors in sampling include sampling design (for example, randomized, stratified randomized), numbers of samples, sample type (for example, discrete, composite, or incremental sampling methodology), and sample depth (for example, surface or deeper horizons).
The selection of sampling design is dependent on the goals of the study. If different geologies will be evaluated independently then a stratified random sampling approach is more appropriate. If all samples will be used collectively to develop a single background sample location, then a simple random sampling approach may be more appropriate. Pros and cons for different sample types are described in Section 9.4.
Sampling depth between site-specific and default soil background datasets should also be comparable. In some cases, there may be areas where this is not possible due to underlying rock, but the goal should be to be as consistent as possible. As an example, the USGS dataset (Smith et al. 2013) included surface samples (0 to 5 cm), and two soil horizons: uppermost mineral zone and partially weathered parent material.
3.5 Laboratory Analysis
It is important to ensure usage of analytical methods that give substantively similar results across default background, site-specific background, and site investigation studies. Cleanup sites typically use USEPA’s analytical methods, which should also be used for soil background studies. Note that some entities use analytical methods that differ from those validated by the USEPA (see Sections 10.4 and Section 14.1). Analytical methods are discussed further in Section 10.
3.6 Using an Existing Soil Background Study
If it is not feasible to conduct a site-specific soil background study, it may be appropriate to use data from a previously conducted soil background study to establish default soil background for a larger area (for example, a state, region, or unique geological area). Section 13 contains a list of some previously conducted studies that might be appropriate to use. This list is not intended to be exhaustive and there are other studies that can be used. Use of an existing study will reduce investigation costs and time. However, using an existing study that is not appropriate may lead to inappropriate site decisions as well as additional delays or cost if it is decided later that a soil background study specific to the needs is necessary. Thus, an existing study must be critically evaluated to ensure it is appropriate to use.
Site attributes and sampling methods may not completely agree between the default background dataset and the cleanup sites being evaluated. Even so, these studies are not necessarily inappropriate for determining background values, as long as the sampling design and methods are adequately explained and comparable. The implications of differing sample design or methods should be well documented, along with any uncertainties in the comparison with background and investigative site values.
It is important to ensure the study uses adequate documentation, including:
- sampling design and methods (for example, anthropogenic versus natural background, sample depths)
- site topography and soil sample locations
- soil boring logs, including composition, stratigraphy, and depth to water (vadose zone thickness)
- quality assurance and quality control (QA/QC) sampling ((IDEQ 2018))
The specific considerations for using a default background dataset are similar to the considerations for developing a default background dataset discussed in Section 11.1.1. These considerations are identified in this section, but precautions addressed in Section 11.1.1 should be reviewed as well.
- Purpose of existing study
- What was the purpose of the existing study and how were the data intended to be used?
- Does the intended purpose match the purpose of establishing default soil background?
- Does the study include areas and types of samples that would be included if a site-specific background study were to be conducted?
- Is the area included in the study representative of the cleanup sites that will be using the data for screening?
- How old is the study? Are the data and results still relevant and representative?
- Type of soil background
- Was the goal to sample natural background or anthropogenic ambient soil background?
- Natural background—Were samples collected outside areas influenced by diffuse anthropogenic sources and point or direct sources?
- Anthropogenic ambient soil background—Were samples collected in areas affected by diffuse anthropogenic sources but at sufficient distance from direct and indirect releases if these are not included per the regulatory agency (for example, outfalls, roadways, industrial activities)?
- Was the goal to sample natural background or anthropogenic ambient soil background?
- Geologic and geochemical considerations
- What geologic areas are represented by the soil background study?
- Are the existing study and the cleanup sites sufficiently similar in physical, chemical, geological, and biological characteristics?
- Hydrogeologic conditions
- Does the existing study contain samples in sediments or wetlands, or other areas influenced by water?
- Is this different from the cleanup sites that will be screened?
- Sampling
- Are the sampling methods used in the existing study sufficiently similar to those used at the cleanup sites that will be screened?
- Are the sample depths used in the existing study similar to those that will be used at the cleanup sites that will be screened?
- Analytical methods
- Are the extraction/digestion methods and analytical methods used for the existing study similar to the cleanup sites that will be screened?
- Are the particle sizes analyzed in the existing study the same as those that will be analyzed in the cleanup sites to be screened?
- Are the measurements (wet weight or dry weight) used in the existing study the same as those that will be used at the cleanup sites to be screened?
3.7 Background Dataset Analysis
Once a suitable background dataset has been collected or identified careful data analysis must also take place. Of primary importance are data distribution, how outliers are handled, and statistical software selection. These topics and other data analysis topics are covered in depth in Section 11.
Data distribution is a descriptive statistic, often represented by a graphed curve, which describes all the values within a dataset and the frequency at which those values occur. Not all data are distributed in the same manner, and categories have been developed to describe common data distributions. The most recognized distribution is the normal distribution (Section 11.1, Table 11-1). Statistical tests often have underlying assumptions regarding sampling distribution.
An outlier or an outlying observation refers to an extreme observation in either direction that appears to deviate markedly in value from other measurements of the dataset in which it appears. In practice, only outliers that are demonstrably erroneous or belonging to populations not representative of background conditions should be excluded from the background dataset. In background investigations, typical sources of error that can result in outliers include: (a) transcription error, (b) sampling error, (c) laboratory error, and (d) sampling of media not representative of background conditions as determined by forensic and geochemical analyses. Outliers are discussed in depth in Section 11.5.
Selecting the statistical software package that will be used to analyze the background value dataset will significantly impact the background value determination process. There are many readily available software packages that can be useful for background data analysis (see Section 11.9, Table 11-6). While most of the statistical analysis programs listed will have the capability to conduct a majority of the analytical methods required for background statistical analysis, not all programs will be able to easily conduct all methods.
3.8 Establishing Default or Site-Specific Soil Background
Default or site-specific BTV can be an upper bound comparison value generated from the soil background dataset. To calculate a BTV for soil, it is important to review the dataset and understand the distribution of the data (Section 11.2), determine how nondetect values will be handled (Section 11.3) , present the data graphically (Section 11.4), and identify and remove extreme, isolated outlier value(s) (Section 11.5).
Once the background dataset is established, several statistical values are available for use as the BTV for site data comparison. These statistical values are upper bound estimates of the background dataset (definitions from (USEPA 2015)):
- upper percentiles—value below which a specific percentage of the population occurs (for example, 95th percentile).
- upper prediction limits (UPL)—the predicted upper bound value for a single comparison value.
- upper tolerance limit (UTL)—an upper confidence limit on a percentile of the population. For example, a 95-95 UTL is the value below which 95% of the population will fall with 95% confidence.
- upper simultaneous limit (USL)—the upper boundary of the largest value in a background dataset.
- maximum detected value—may result in false positives (for example, the sample set may not be large enough to have fully measured the higher end values), particularly in a small sample set.
Less frequently, the single statistical value provided for a background dataset is an upper confidence limit (UCL). A UCL represents an upper bound estimate of the mean and, if used, should be compared to a mean value for the site dataset; this is useful information to determine whether there is pervasive, low-level contamination of the site soil. It is not appropriate to use the UCL of the mean for a background dataset in point-to-point comparisons with site data, since the UCL of the mean does not represent an upper bound of the soil background concentration (BTV).
It is important to understand the basis for any published background values when making a comparison to site data. Section 11 provides a more detailed discussion of the pros and cons of selecting one of these values as the background value. As also discussed in Section 11, there are several statistical software packages used to evaluate reference datasets and calculate BTVs, including USEPA’s ProUCL software. USEPA’s ProUCL software is most often used by the regulatory community because it is well documented, relatively easy to use, and specific to the types of statistics that are relevant to the environmental field.
A single sample or a few samples above these upper bound values may not indicate a potential impact to soil above background. As one increases the number of comparisons between site data and the background dataset, the possibility of a true background value in the site dataset exceeding the single background statistic (false positive error) increases.
3.8.1 Using a site-specific background dataset
Comparisons of a site-specific background dataset and a contaminated site dataset also can be made using both datasets rather than (or in addition to) comparison of upper end concentrations to a BTV. These comparisons can be made using statistical methods described in Section 11, such as Student’s t-test for normal datasets or Wilcoxon rank sum (WRS) test (also referred to as the “Wilcoxon-Mann-Whitney test” or “Mann Whitney U test”) for datasets that are not normally distributed. These methods are based on comparisons of the central tendencies of these datasets rather than just the upper end of the distribution. The central tendency comparisons are more reflective of the potential for exposure, which is based on an upper estimate of the mean of a dataset. If the statistical tests indicate that the datasets are different, that suggests that exposure to typical concentrations in one dataset is different from exposure to typical concentrations in the other dataset. Use of both BTV comparison and a dataset comparison provides a more complete analysis of the site and background datasets. As noted previously (Section 3.1.2), the two procedures are complementary, because they test for the presence of different types of contamination. If a given chemical in a site dataset fails either test, then it can be examined further using geochemical evaluation or environmental forensics to confirm or rule out the actual presence of site-related contamination.
3.8.2 Advanced methods
If it is found that the methods discussed above cannot be used to establish site-specific background, there are several advanced statistical methods that may be used to extract a soil background set from a site dataset. A more significant degree of professional judgment is necessary when using these methods, which may lead to significant uncertainty. If it is decided to use one of these methods, it is essential that an expert statistician be included on the project team and understand the underlying uncertainties. These methods are not intended to be used by a risk assessor or risk manager without the assistance of a statistician. Use of one of these methods also requires a larger dataset than the statistical methods described previously in this document. Some advanced statistical methods (discussed in Section 11) that can be useful in some situations include:
- iterative graphical approach (Section 3.9)
- multivariate methods
- principal component analysis
- discriminant analyses
- polytopic vector analysis
- soft independent modeling of class analogy
3.9 Extracting Site-Specific Background Dataset from an On-site Dataset
The site-specific background dataset extraction approach described in this section briefly addresses technical issues for environmental scientists and managers faced with how to determine site-specific background level analyte concentrations. The site-specific background dataset extraction approach represents an iterative graphical approach to build consensus when site background cannot be determined following standard methodologies and policies described in guidance documents developed by the USEPA and state agencies. The background extraction approach tends to yield a defensible background dataset of reasonable size (often much larger than the one that is collected by traditional sampling of off-site background reference areas) with geological and anthropogenic (when present) influences comparable to those of the site under study. Several new terms and phrases have been used in this section, which are described in Appendix E. This section describes the reality that “background” is often a negotiated estimate rather than a strict statistical or scientific one. Just as the conceptual site model is “evergreen,” the understanding of background may evolve during the entire course of an environmental project and should be revisited accordingly.
The USEPA’s Office of Solid Waste and Emergency Response (OSWER) has developed several guidance documents (for example, (USEPA 1989), (USEPA 1992), (USEPA 1992), (USEPA 2002), (USEPA 2002) and (USEPA 1995)) covering how a traditional background dataset is to be sampled/collected, how the data are analyzed, and when background data are necessary to perform site and background evaluations. This section is not intended to address federal and/or state agencies’ policy-related decisions on when to collect background samples or how to use background data to achieve cleanup levels/achieve applicable or relevant and appropriate requirements (ARARs). It is emphasized that the background extraction approach should be used only when methods following USEPA policies have failed and/or it is not possible to collect a sufficient amount of traditional background data from unimpacted off-site locations. Additional useful information about background evaluations in soils and extraction of site-specific background can also be found in (USDON 2002) and the ASTM E3242-20 (ASTM 2020) document.
3.9.1 Need for background data
Site managers and risk assessors need to determine whether an analyte at the site is present due to 1) site-related chemical releases; 2) non-site-related anthropogenic sources and influences; and/or 3) inherent natural background variability. Determining a site-specific background with natural and anthropogenic influences comparable to the site is an important aspect of performing exposure and risk assessments and of establishing the scope of site-related releases and determining COPC. Some examples of non-site-related anthropogenic activities may include ubiquitous polycyclic aromatic hydrocarbon (PAHs) compounds formed during the incomplete burning of organic materials; vehicular exhaust, and emissions from wearing of tires; domestic heating; or pesticide runoff from agricultural practices at other site areas.
3.9.2 When to use the background extraction approach
In the presence of anthropogenic influences and variable site geology, it becomes challenging to identify an off-site background reference area not impacted by site-related activities because of the confounding factors of non-site-related chemical releases and inherent natural variability. In such complex situations, an iterative quantile-quantile (Q-Q) plots-based background extraction approach following a population partitioning method (for example, (Singh, Singh, and Flatman 1994)) can possibly be used to extract analyte concentrations from a broader on-site mixture dataset representing a site-specific background dataset with geological formations and anthropogenic influences comparable to those of the site. It should be noted that there are other population partitioning approaches available in the statistical literature that can also be used to extract a background dataset from an on-site mixture dataset. Specifically, in the multivariate setting (evaluating several analytes simultaneously), multivariate methods including principal component and discriminant analyses ((Anderson 2003), (Johnson and Wichren 2015), (McLachlan 2004), (Wolfe 2010)) can be used to tease out multiple populations potentially present in an on-site mixture dataset and determine the background subpopulation.
In this section, the univariate Q-Q plots-based iterative approach has been used only to extract a site-specific background dataset from the on-site dataset. It should be pointed out that no attempt has been made to determine and specify potential intermediate subpopulations present in the on-site dataset as described in (Singh, Singh, and Flatman 1994). It should also be noted that normal Q-Q plots (as used in this section) are routinely used as an exploratory tool (for example, (Tukey 1977), (Hoaglin, Mosteller, and Tukey 1983)) to identify outliers and multiple populations potentially present in a dataset. In this iterative process, no Gaussian model is used to draw any statistical inference, including estimation and hypothesis testing. Therefore, the reader should not assume that the background extraction process described in this section can be used only when the mixture on-site dataset follows a normal/Gaussian distribution. Once a background dataset of a sufficient amount of observations has been established (extracted or traditional), background data distribution is determined using goodness of fit (GOF) tests; many GOF tests are available in the ProUCL 5.1 software. Depending upon the data distribution of the extracted background dataset, hypothesis testing approaches and BTV estimations are used to perform background versus site comparisons. BTVs can be used as screening values to identify COPC and determine site locations exceeding background level concentrations. A BTV represents a parameter in the upper tail of the background population distribution; some statistics used to estimate BTVs include upper prediction limits (UPLs), upper tolerance limits (UTLs), and upper simultaneous limits (USLs). There is no consensus about the use of an upper limit to estimate a BTV. A brief description of these upper limits is presented in Section 11.7 with additional information described in Appendix A. Additional theoretical details can be found in the ProUCL 5.1 Technical Guide (USEPA 2015). An on-site concentration less than the BTV estimate may be intrepreted as representative of an unimpacted background location, and an on-site concentration exceeding a BTV estimate is viewed as coming from a potentially contaminated site area.
3.9.2.1 Sites amenable to the use of the iterative background extraction process
The iterative background extraction process may be used at a contaminated site with the following characteristics:
Sites (for example, federal facilities, industrial complexes, mining sites)
- consisting of heterogeneous areas with high natural variability,
- known to have had many on-site releases with many areas of concern (AOCs) and operating units, and
- with known urban development and other anthropogenic activities (for example, farming, use of petroleum products, training and testing performed by the U.S. military).
Attempts to collect an off-site traditional background dataset following methods described in USEPA guidance were found to be deficient for reasons such as inability to identify relevant unimpacted areas and to collect enough off-site background data appropriate to perform statistical background versus site evaluations.
A database already exists consisting of a large number of analytical results (data points) collected over a defined time providing sufficient coverage to the site AOCs, and all stakeholders agree that it is reasonable to assume that the database consists of concentrations that can be used to represent unimpacted background locations. The size of the existing on-site dataset depends upon the sampling efforts performed at the site and size of the site. For smaller sites, the availability of an on-site dataset of size 200–250 data points may be sufficient; however, for larger sites (for example, federal facilities), it is desirable to have the availability of larger (for example, 300–400 data points or greater) on-site datasets providing sufficient coverage to all AOCs present at the site.
3.9.2.2 Assumptions and involvement of the project team
The available on-site data should be large enough to provide sufficient coverage to the site AOCs. In addition to containing concentrations representing locations impacted by site-related releases, the dataset also contains concentrations representing unimpacted locations. Based on this prerequisite, that within all environmental site datasets exist background level concentrations, non-site-related anthropogenic concentrations (may or may not be present), and concentrations indicative of site releases, a normal Q-Q plots-based iterative method represents a viable approach to extract an anthropogenic site-specific background dataset from a broader mixture on-site dataset. As noted above, depending upon the site size, an existing on-site dataset containing at least 200–400 data points may be sufficient to use the background extraction method provided stakeholders agree that it is reasonable to assume that the dataset also contains concentrations representing locations unimpacted by site-related releases. Since most sampling at a site is performed on suspected contaminated areas, the stakeholders need to take into consideration this fact when making an assumption as to the reasonableness of having unimpacted data points in their on-site dataset. Hence this methodology’s recommendation for large datasets where multiple attempts at finding representative background samples have already occurred and failed. The more site samples and attempts at determining background through established guidelines, the more likely the dataset contains unimpacted data points within the numerous site samples while still not being easily discernible as representative background.
The involvement of the project team and site experts is essential for successful application of the iterative normal Q-Q plots-based method to determine an appropriate background breakpoint (BP) and extract a site-specific background dataset from a broader on-site dataset. In this section, a background BP represents a value that distinguishes between background level concentrations and concentrations representing impacted site locations. The background BP is determined using Q-Q plots generated iteratively on the on-site dataset. Because of the inherent subjective/expert decision in determining outliers and multiple populations, this method must be performed with sufficient input and agreement from all stakeholders. Based upon the information provided by iteratively generated Q-Q plots, the project team makes the final determination about an appropriate background BP distinguishing between the concentrations representing a background population and contaminated population representing impacted site locations. From the statistical point of view, the approach can be used on any on-site dataset collected from any environmental medium. However, the applicability of the approach may also depend upon the analyte of interest (for example, PAHs) and the site medium (for example, soil) under investigation. It is recommended that the project team consult experts (for example, soil scientists, geochemists) before using the approach on some datasets, such as PAHs in soil.
3.9.2.3 Treatment of nondetect observations
Nondetects (NDs) do not represent impacted locations if their detection limits are sufficient to identify concentrations of interest. Sometimes, detection limits (DLs)/reporting limits (RLs) associated with ND observations are significantly higher values (for example, PAHs, metals in soil) than the detected observations. The use of NDs with elevated DLs tends to mask detected observations representing contaminated locations. Elevated NDs exceeding the detected observations interfere with the proper determination of a background BP, therefore causing difficulties in the proper extraction of a site-specific background dataset. In most cases, NDs with elevated DLs should be excluded from the extraction process. It is emphasized that only NDs with elevated DLs need to be excluded from the extraction process; all other NDs may stay in the pooled dataset used to extract a background dataset. Once a background BP has been determined, all detect and nondetect observations less than or equal to the background BP are included in the extracted site-specific background dataset.
3.9.3 Using the existing off-site background data—highly recommended when available
The background extraction approach is used when a representative traditional background dataset of adequate size is not available; guidance about the size of the background dataset is provided in Section 9 and Section 11 of this document. If the team is not confident enough to use the existing background dataset to perform background evaluations and wants to use the extraction process on the existing on-site dataset, the extraction process should be used on the combined on-site and the available off-site background data. In this scenario there is no need to separately evaluate (for example, identify outliers) the background dataset. The iterative process on the combined dataset takes care of outliers (if any) present in the existing background dataset.
3.9.4 Statistical approach
Environmental scientists have borrowed the normal probability plots/normal Q-Q plots-based approach from geochemical and mining applications (for example, (Sinclair 1974), (Sinclair 1976), (Sinclair 1983), (Sinclair 1991), (Fleischhauer and Korte 1990), (Halil and Sarac 1988), (Papastergios et al. 2011)). The probability plot/Q-Q plot-based background extraction approach has been used on on-site datasets collected from the various environmental media, including groundwater (for example, (Kim et al. 2015), (Panno et al. 2006), (Panno et al. 2007)), sediments (for example, (Halil and Sarac 1988)), and soils (for example, (Cook 1998), (Matschullat, Ottenstein, and Reimann 2000), (Reimann, Filzmoser, and Garrett 2005), (Reimann and Garrett 2005), (Renez et al. 2011), (Cal DTSC 2009), (HI DOH 2012), (BC Environment 2001)). In related documents available in the literature (and some cited above), the normal Q-Q plots/normal probability plots-based approach has been used as an exploratory tool only to identify outliers and multiple populations present in a mixture dataset.
The exploratory probability plots ((Sinclair 1974), (Sinclair 1976), (Fleischhauer and Korte 1990)) or equivalent Q-Q plots ((Singh, Singh, and Flatman 1994), (Reimann and Garrett 2005)) based method is used to extract a site-specific background dataset from a broader on-site dataset with anthropogenic and geological conditions comparable to those of the site under study. The approach is used on raw untransformed datasets and does not require that the dataset should be normally distributed. In the context of deriving a background dataset from a mixture on-site dataset, a probability plot/Q-Q plot is used as an exploratory tool ((Tukey 1977), (Hoaglin, Mosteller, and Tukey 1983)) to identify multiple populations (and outliers) rather than using it to assess the data distribution. Whether the data are normally or lognormally distributed or follow some other distribution, a normal probability plot in the original raw scale represents a useful tool for exploring the presence of multiple populations and outliers in a dataset.
Normal Q-Q plots are used iteratively to identify locations that can be used to represent site background. Depending on data variability and on-site dataset size, several iterations may be required to determine a subset of lower concentrations that can be used to represent a site background dataset. The discontinuities and inflection points in a Q-Q plot are considered to represent transition between different populations, possibly representing different site areas with varying degree of contamination. When using an on-site dataset consisting of observations from multiple populations, a background BP is selected at a relatively low concentration level (for example, (Sinclair 1974), (Sinclair 1976)), which is determined by the project team using the information provided by the iteratively generated normal Q-Q plots. The inflection points are not always self-evident. In those cases, their identification may rely on expert judgment and that should be recognized and acknowledged by the project team prior to undertaking the process.
Starting from the top of the initial Q-Q plot generated using all data values, discontinuities and inflection points are identified, and new Q-Q plots are generated without using values greater than the inflection point and/or the point of discontinuity. A continuous (without discontinuities and/or inflection points) Q-Q plot (not necessarily exhibiting a straight line) suggests that the dataset comes from a single population. If a Q-Q plot does not represent a continuous graph, the process should be repeated iteratively, removing higher concentrations at each iteration. The iterative process stops when a Q-Q plot displays a continuous pattern without inflection points and/or discontinuities of considerable magnitude as determined by the project team. Based upon continuity, inflection points, and breaks of considerable magnitude present in iteratively computed Q-Q plots, the project team determines a background BP, distinguishing between concentrations representing a background dataset and site data representing impacted site locations. Fleischhauer and Korte (1990) demonstrated that small variations in the estimation of the position of the background BP or the inflection point on a probability plot are unlikely to significantly influence the resulting background concentration breakpoint. By using the iterative Q-Q plots-based approach on a pooled dataset consisting of on-site and off-site concentrations, many on-site locations exhibiting lower concentrations (for example, less than the background BP) will be considered as representing background locations, and background locations exhibiting elevated concentrations (for example, outliers) will not be included in the extracted background dataset.
Once a background BP has been agreed upon by all parties and members of the project team, all observations (detects and nondetects) in the pooled on-site dataset less than or equal to the background BP may be considered to represent an extracted site-specific background dataset. The final Q-Q plot of the extracted background data should be fairly continuous and without inflection points representing a single population. Decision-making statistics such as UTLs are computed based upon a dataset representing a single statistical background population (fundamental assumption). Statistical goodness of fit (GOF) tests are performed to determine the distribution of the extracted background dataset. Depending upon the probability distribution of the resultant background dataset, a parametric or a nonparametric upper limit (for example, UTL, USL) is computed to estimate the BTV. Also, depending upon the project status, project objectives, and data needs, background versus site comparisons may also be performed using graphical displays and hypothesis testing approaches described in USEPA guidance documents ((USEPA 2002), (USEPA 2006)) and available in the ProUCL 5.1 software.
3.9.4.1 Step-by-step summary of the iterative background extraction method
A step-by-step summary of the iterative process used to determine a background BP and to extract and establish a site-specific background dataset is described as follows.
- Use exploratory graphical displays (for example, box plots, index plots, and Q-Q plots) and/or hypotheses testing approaches to determine whether there are significant differences in constituent concentrations in the various strata (surface versus subsurface) of an environmental medium (for example, soils, sediments). For constituents with statistically significant differences in surface and subsurface soil concentrations, separate background datasets may be extracted for each stratum; otherwise, one background dataset for all strata combined would be extracted. However, it is up to the project team to decide whether separate background datasets would be extracted even when the concentrations of the two or more (for example, soil types) strata are comparable. Statistical methods and graphical displays needed to perform tests listed in this step are available in ProUCL 5.1.
- Use exploratory iterative normal Q-Q plots on the pooled mixture on-site data to determine a background BP, separating background concentrations/locations representing unimpacted locations and concentrations potentially representing locations contaminated by on-site chemical releases. When elevated DLs are associated with NDs, only detected observations, or all detects and NDs (except those with elevated DLs), may be used in this step. However, it is possible that the true background threshold concentration is below all the DLs, and that the detectable concentrations may contain only contaminated data. This determination must be made by the project team.
- NDs may be present in a background dataset; after a background BP has been identified, use all detects and NDs in the pooled dataset less than or equal to the background BP to establish a site-specific extracted anthropogenic ambient soil background dataset.
- Perform GOF tests on the extracted background dataset. Depending upon the data distribution, compute parametric or nonparametric upper limits (UTLs, USLs) to compute BTV estimates. A brief description of UPLs, UTLs, and USLs is provided in Appendix A.
- Optional: Use color-coded index plots to compare impacted on-site data and extracted background data. A color-coded index plot representing a snapshot of the entire on-site area with many AOCs and extracted background data provides added insight to the site managers and the responsible party and helps them make informed cleanup decisions.
The approach described here has been illustrated using an arsenic dataset collected from surface and subsurface soils of a real polluted site. A brief description of the computation of upper limits is provided in Appendix A, a description of index plots is provided in Appendix B, and the terminology used is summarized in Appendix E.
3.9.5 Extracting background-level arsenic concentrations from the on-site soil dataset of a Superfund site
This real dataset example illustrates the site-specific background extraction method described in Section 3.9. The dataset used in this example comes from surface (SS) and subsurface (SB) soils of a large Superfund site (Site) containing many AOCs. The Site is very heterogeneous with varying geology and soil types. The on-site SS and SB soils data were collected from many AOCs: a1, a6, a7, a10, a11, a12, a14, a20, a22, a23, s1, s2, s2, s4, s5, and s6 present at the Site. The Site AOCs are contaminated due to site-related releases as well as non-site-related anthropogenic activities (for example, farming). It is also likely that concentrations of the COPC in different AOCs vary due to natural/inherent variability in Site geology and soil types. A limited amount of off-site background data (denoted by bk in graphs) was also available. However, due to natural geological variability and the presence of anthropogenic activities, the project team and the state personnel were not confident that the available background data could be used to perform defensible background evaluations. The project team was concerned that additional traditional background data with inherently comparable and anthropogenic site conditions could not be collected following standard USEPA practices. Therefore, the project team decided to use the iterative Q-Q plot-based method to extract site-specific background datasets to establish sitewide background datasets for the COPC. The existing on-site arsenic dataset collected from soils of the AOCs, and off-site background locations has been used to extract a sitewide arsenic background dataset. This example walks through the background extraction approach used to extract and establish sitewide background datasets and compute BTV estimates based upon the extracted background dataset.
The first step is to determine whether arsenic concentrations in surface soil and subsurface soil are comparable; the data may be combined only if they are not statistically significantly different. Figure 3-1 displays multiple Q-Q plots comparing arsenic in surface and subsurface soils and Figure 3-2 displays an index plot comparing arsenic in surface and subsurface soils (also see Appendix B). In this section, normal Q-Q plots are used to identify multiple populations present in a pooled on-site dataset and determine if the subset consisting of the lowest set of concentrations can be used to represent a site-specific background dataset. Discontinuities (breaks, jumps) and inflection points on a Q-Q plot suggest the presence of data from multiple populations.
The Tarone-Ware test results comparing arsenic in surface soil and subsurface soil are summarized in Table 3-1. The graphical displays shown in Figure 3-1 and Figure 3-2, and the Tarone-Ware test results of Table 3-1 suggest that arsenic concentrations in surface and subsurface soils differ significantly (p-value << 0.05). Therefore, the project team decided to extract separate background datasets for arsenic in surface and subsurface soils. The process used to extract and establish a sitewide arsenic background dataset for surface soils is described as follows.
Table 3-1. Tarone-Ware test results comparing arsenic in surface and subsurface soils
Source: Anita Singh ADI-NV Inc.
Sample 1 Data: s-mg/kg(sb) | |||
Sample 2 Data: s-mg/kg(sb) | |||
Raw Statistics | |||
Sample 1 | Sample2 | ||
Number of Valid Data | 370 | 809 | |
Number of Non-Detects | 80 | 53 | |
Number of Detects | 290 | 756 | |
Minimum Non-Detect | 0 | 0.18 | |
Maximum Non-Detect | 2.4 | 4.5 | |
Percent Non-Detects | 21.62% | 6.55% | |
Minimum Detect | 0.49 | 0.34 | |
Maximum Detect | 48.7 | 144 | |
Mean of Detects | 5.029 | 6.313 | |
Median of Detects | 3.7 | 3.7 | |
SD of Detects | 5.382 | 10.06 |
Sample 1 vs Sample 2 Tarone-Warw Test | |||
H0: Mean/Median of Sample 1 = Mean/Median of Sample 2 | |||
TW Statistic | -4.622 | ||
Lower TW Critical Value(0.025) | -1.96 | ||
Upper TW Critical Value(0.975) | 1.96 | ||
P-Value | 3.8042E-6 | ||
Conclusion with Alpha = 0.05 | |||
Reject H0. Conclude Sample 1 <> Sample 2 | |||
P-Value < alpha (0.05) |
Figure 3-1. Q-Q plots comparing arsenic in surface and subsurface soils. Using the pooled redataset consisting of arsenic concentrations of the existing background and various AOCs; a horizontal line is displayed at the largest detection limit.
Source: Anita Singh ADI-NV Inc.
Figure 3-2. Index plot comparing arsenic in surface and subsurface soils. Using the pooled dataset consisting of arsenic concentrations of the existing background and various AOCs; a horizontal line is displayed at the largest detection limit.
Source: Anita Singh ADI-NV Inc.
An examination of the graphical displays shown in Figure 3-1 and Figure 3-2 (and results of Table 3-1) suggests that surface soil of the Site exhibits greater arsenic concentrations than subsurface soil.
Next, iterative Q-Q plots are generated. Figure 3-3 has the initial normal Q-Q plot generated using all detected arsenic concentrations collected from surface soil of the AOCs and the existing background (bk) areas. Note that NDs are excluded from the extraction process but will be included in the extracted background dataset. From Figure 3-3, a large break in the normal Q-Q plot was noted at 92.1 mg/kg and another break was noted around the arsenic concentration of 32.8 mg/kg. To determine (magnify) the magnitude of these discontinuities, another Q-Q plot was generated using arsenic values less than 33 mg/kg as shown in Figure 3-4.
Figure 3-3. Normal Q-Q plot of detected arsenic concentrations in the pooled dataset consisting of the existing background (bk) and AOCs surface soil data.
Source: Anita Singh ADI-NV Inc.
Figure 3-4. Normal Q-Q plot of detected arsenic concentrations <33 mg/kg in the pooled dataset consisting of the existing background (bk) and AOCs surface soil dataset.
Source: Anita Singh ADI-NV Inc.
Observation 22.7 mg/kg shown in Figure 3-4 comes from the existing background dataset (bk) and it represents an outlier in that existing background dataset. After examining the Q-Q plot shown in Figure 3-4, the project team (using available expert site knowledge) determined that 18.5 mg/kg represents a potential background BP. To determine the continuity of the Q-Q plot (with input from the project team) based upon arsenic concentrations ≤ 18.5 mg/kg, another Q-Q plot shown in Figure 3-5 was generated using arsenic values ≤ 18.5 mg/kg. In this figure, a few breaks were noted in the upper part of the Q-Q plot with arsenic values > 15.1 mg/kg; and the lower part of the Q-Q plot with arsenic values ≤15.1 mg/kg appeared to represent a reasonably continuous graph. To confirm these observations, another Q-Q plot shown in Figure 3-6 was generated using detected arsenic values ≤ 15.1 mg/kg.
Figure 3-5. Normal Q-Q plot of detected arsenic concentrations ≤18.5 mg/kg in the pooled dataset consisting of the existing background (bk) and AOCs surface soil dataset.
Source: Anita Singh ADI-NV Inc.
Figure 3-6. Normal Q-Q plot of detected arsenic concentrations ≤ 15.1 mg/kg in the pooled dataset consisting of the existing background (bk) and AOCs surface soil data.
Source: Anita Singh ADI-NV Inc.
The graph shown in Figure 3-6 represents a fairly continuous graph. Input from the project team played an important role at this step. Based upon the information provided by the iterative Q-Q plots shown in Figure 3-3 through Figure 3-6, and taking the conceptual site model (CSM) into consideration, the project team decided to use 15.1 mg/kg as a background BP distinguishing between site-specific background and contaminated on-site concentrations.
Site-specific Background BP and Extracted Background Data (bk-extrct): All surface soil arsenic concentrations (detects and NDs) less than or equal to the background BP, 15.1 mg/kg, were used to establish a site-specific background dataset. Figure 3-7 exhibits an exploratory normal Q-Q plot (including detects and NDs) based upon the extracted arsenic background dataset, which is labeled as bk-extrct in the graphs.
Figure 3-7. Normal Q-Q plot based upon the extracted background data (bk-extrct) for arsenic in surface soil with concentrations ≤ 15.1 mg/kg (detects and nondetects).
Source: Anita Singh ADI-NV Inc.
Computing BTV Estimates for Arsenic in Surface Soil: Summary statistics and BTV estimates based upon the extracted background dataset (bk-extrct) are summarized in Table 3-2. The detected background data shown in Figure 3-7 do not follow a discernible distribution; nonparametric statistics were used to estimate BTV. In this case, the project team decided to use a 95% USL (=15.1 mg/kg) as an estimate of the BTV.
Table 3-2. Calculation of BTV estimates for arsenic in surface soils
Source: Anita Singh ADI-NV Inc.
As-mg/kg-Bk-Extrct | |||
General Statistics | |||
Total Number of Observations | 758 | Number of Missing Observations | 0 |
Number of Distinct Observations | 213 | ||
Number of Detects | 705 | Number of Non-Detects | 53 |
Number of Distinct Detects | 194 | Number of Distinct Non-Detects | 33 |
Minimum Detect | 0.34 | Minimum Non-Detect | 0.18 |
Maximum Detect | 15.1 | Maximum Non-Detect | 4.5 |
Variance Detect | 8.867 | Percent Non-Detects | 6.992% |
Mean Detected | 4.355 | SD Detected | 2.978 |
Mean of Detected Logged Data | 1.267 | SD of Detected Logged Data | 0.647 |
Critical Values for Background Threshold Values (BTVs) | |||
Tolerance Factor K (for UTL) | 1.74 | d2max (for USL) | 3.806 |
Data do not follow a Discemible Distribution (0.05) | |||
Nonparametric Upper Limits for BTVs(no distinction made between detects and nondetects) | |||
Order of statistic, r | 729 | 95% UTL with 95% Coverage | 12.1 |
Approximate f | 1.279 | Confidence Coefficient (CC) achivied by UTL | 0.923 |
95% UPL | 11.81 | 95% USL | 15.1 |
An index plot comparing extracted background arsenic data (in blue) with concentrations of the Site AOCs (not part of the extracted background) is shown in Figure 3-8.
Figure 3-8. Index plot of arsenic in surface soil comparing AOCs data with extracted arsenic background data (bk-extrct) and BTV estimates: 95-95 UTL = 12.1 and 95 USL = 15.1 mg/kg.
Source: Anita Singh ADI-NV Inc.
A single index plot of a COPC represents a comprehensive snapshot of the entire on-site dataset by identifying on-site locations with concentrations exceeding the BTV estimates, AOCs exhibiting elevated constituent concentrations, and AOCs having higher concentrations in comparison with the various other AOCs. From Figure 3-8, it is noted that AOCs a1, a10, a20, a23, a6, s5, and s6 exhibit lower arsenic concentrations that are considered to represent site-specific background, and the remaining data from AOCs a11, a12, a7, s1, s2, and s4 exhibit concentrations much higher than those of the extracted site-specific background data (bk-extrct shown in blue) and BTV estimates. These kinds of graphical displays help the site managers and the responsible party in making informed decisions to move ahead to make cleanup decisions.
Optional Exercise: For the present Site, GPS coordinates were also available; therefore, a post plot displaying concentrations of the extracted background data and various AOCs was generated. The optional post plot shown in Figure 3-9 separates unimpacted and impacted locations, which was supported by the Site CSM. The generation of post plots is optional because it requires the availability of GPS coordinates of the sampled locations.
Figure 3-9. Post plot of arsenic in surface soil showing locations exhibiting background level concentrations (green), intermediate arsenic concentrations (yellow lying between 12.1 mg/kg and 15.1 mg/kg) and concentrations potentially representing impacted site locations (red).
Source: Anita Singh ADI-NV Inc.