| Title: | Ecometric Models of Trait–Environment Relationships at the Community Level |
|---|---|
| Description: | Provides a framework for modeling relationships between functional traits and both quantitative and qualitative environmental variables at the community level. It includes tools for trait binning, likelihood-based environmental estimation, model evaluation, fossil projection into modern ecometric space, and result visualization. For more details see Vermillion et al. (2018) <doi:10.1007/978-3-319-94265-0_17>, Polly et al. (2011) <doi:10.1098/rspb.2010.2233> and Polly and Head (2015) <doi:10.1017/S1089332600002953>. |
| Authors: | Maria A. Hurtado-Materon [cre, aut], Leila Siciliano-Martina [aut], Rachel A. Short [aut], Jenny L. McGuire [aut], A. Michelle Lawing [cph, aut] |
| Maintainer: | Maria A. Hurtado-Materon <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 1.1.1 |
| Built: | 2026-05-11 06:00:54 UTC |
| Source: | https://github.com/mariahm1995/commecometrics |
Internal utilities and variable declarations to support NSE and ggplot2 piping.
Builds an ecometric trait space for quantitative environmental variables, estimating environmental values of each category at each trait bin combination. Also calculates anomalies based on observed values for each point.
ecometric_model( points_df, env_var = "env_var", transform_fun = function(x) x, inv_transform_fun = function(x) x, grid_bins_1 = NULL, grid_bins_2 = NULL, min_species = 3 )ecometric_model( points_df, env_var = "env_var", transform_fun = function(x) x, inv_transform_fun = function(x) x, grid_bins_1 = NULL, grid_bins_2 = NULL, min_species = 3 )
points_df |
Output first element of the list from |
env_var |
Name of the column containing the environmental variable (e.g., "precip"). |
transform_fun |
Optional transformation function for environmental variable (e.g., |
inv_transform_fun |
Optional inverse transformation for environmental variable (e.g., |
grid_bins_1 |
Number of bins for the first trait axis. If |
grid_bins_2 |
Number of bins for the second trait axis. If |
min_species |
Minimum number of species with trait data per point (default = 3). |
A list containing:
points_df |
Filtered input data frame with the following added columns:
|
eco_space |
A data frame representing the ecometric trait space as a grid of trait bins. Each row corresponds to a unique bin combination (x = bin_1, y = bin_2) and includes the predicted environmental value (on the transformed scale if a transformation was applied). |
model |
Linear model object ( |
correlation |
Output from |
diagnostics |
Summary stats about bin usage and data coverage. |
settings |
Metadata including the modeled trait and transformation functions. |
# Load internal dataset data("geoPoints", package = "commecometrics") data("traits", package = "commecometrics") data("spRanges", package = "commecometrics") # Summarize trait values at sampling points traitsByPoint <- summarize_traits_by_point( points_df = geoPoints, trait_df = traits, species_polygons = spRanges, trait_column = "RBL", species_name_col = "sci_name", continent = FALSE, parallel = FALSE ) # Fit an ecometric model using annual precipitation (BIO12) modelResult <- ecometric_model( points_df = traitsByPoint$points, env_var = "precip", transform_fun = function(x) log(x + 1), inv_transform_fun = function(x) exp(x) - 1, min_species = 3 ) # View correlation between predicted and observed values print(modelResult$correlation) # View summary of the linear model fit summary(modelResult$model)# Load internal dataset data("geoPoints", package = "commecometrics") data("traits", package = "commecometrics") data("spRanges", package = "commecometrics") # Summarize trait values at sampling points traitsByPoint <- summarize_traits_by_point( points_df = geoPoints, trait_df = traits, species_polygons = spRanges, trait_column = "RBL", species_name_col = "sci_name", continent = FALSE, parallel = FALSE ) # Fit an ecometric model using annual precipitation (BIO12) modelResult <- ecometric_model( points_df = traitsByPoint$points, env_var = "precip", transform_fun = function(x) log(x + 1), inv_transform_fun = function(x) exp(x) - 1, min_species = 3 ) # View correlation between predicted and observed values print(modelResult$correlation) # View summary of the linear model fit summary(modelResult$model)
Builds an ecometric trait space for qualitative environmental variables, estimating the most probable category and the probability of each category at each trait bin combination. Also calculates prediction accuracy and anomalies for each point.
ecometric_model_qual( points_df, category_col, grid_bins_1 = NULL, grid_bins_2 = NULL, min_species = 3 )ecometric_model_qual( points_df, category_col, grid_bins_1 = NULL, grid_bins_2 = NULL, min_species = 3 )
points_df |
Output first element of the list from |
category_col |
Name of the column containing the categorical environmental variable (e.g., "vegetation"). |
grid_bins_1 |
Number of bins for the first trait axis. If |
grid_bins_2 |
Number of bins for the second trait axis. If |
min_species |
Minimum number of species with trait data per point (default = 3). |
A list containing:
points_df |
Filtered input data frame with the following added columns:
|
eco_space |
A data frame representing the ecometric trait space as a grid of trait bins. Each row corresponds to a unique bin combination (x = bin_1, y = bin_2) and includes the predicted environmental category ( |
diagnostics |
Summary stats about bin usage and data coverage. |
settings |
Metadata including the modeled trait. |
prediction_accuracy |
Overall percentage of correct predictions. |
# Load internal data data("geoPoints", package = "commecometrics") data("traits", package = "commecometrics") data("spRanges", package = "commecometrics") # Step 1: Summarize trait values at sampling points traitsByPoint <- summarize_traits_by_point( points_df = geoPoints, trait_df = traits, species_polygons = spRanges, trait_column = "RBL", species_name_col = "sci_name", continent = FALSE, parallel = FALSE ) # Step 2: Run ecometric model using land cover class as qualitative variable modelResult <- ecometric_model_qual( points_df = traitsByPoint$points, category_col = "vegetation", min_species = 3 ) # View the percentage of correctly predicted categories print(modelResult$prediction_accuracy)# Load internal data data("geoPoints", package = "commecometrics") data("traits", package = "commecometrics") data("spRanges", package = "commecometrics") # Step 1: Summarize trait values at sampling points traitsByPoint <- summarize_traits_by_point( points_df = geoPoints, trait_df = traits, species_polygons = spRanges, trait_column = "RBL", species_name_col = "sci_name", continent = FALSE, parallel = FALSE ) # Step 2: Run ecometric model using land cover class as qualitative variable modelResult <- ecometric_model_qual( points_df = traitsByPoint$points, category_col = "vegetation", min_species = 3 ) # View the percentage of correctly predicted categories print(modelResult$prediction_accuracy)
Visualizes the ecometric space for quantitative environmental variables
based on the output from ecometric_model().
ecometric_space( model_out, env_name = "env_var", fossil_data = NULL, fossil_color = "#000000", modern_color = "#bc4749", palette = c("#bc6c25", "#fefae0", "#606c38"), x_label = "Summary metric 1", y_label = "Summary metric 2" )ecometric_space( model_out, env_name = "env_var", fossil_data = NULL, fossil_color = "#000000", modern_color = "#bc4749", palette = c("#bc6c25", "#fefae0", "#606c38"), x_label = "Summary metric 1", y_label = "Summary metric 2" )
model_out |
Output from |
env_name |
Name to display for the environmental variable (used in the legend title). |
fossil_data |
Optional. Output from |
fossil_color |
Outline color for fossil data bins (default: |
modern_color |
Outline color for modern data bins (default: |
palette |
Vector of colors to use for the gradient scale representing environmental values. |
x_label |
Label for the x-axis in the output plots (default: "Summary metric 1"). |
y_label |
Label for the y-axis in the output plots (default: "Summary metric 2"). |
A ggplot2 object visualizing the ecometric trait-environment surface.
# Load internal data data("geoPoints", package = "commecometrics") data("traits", package = "commecometrics") data("spRanges", package = "commecometrics") data("fossils", package = "commecometrics") # Summarize trait values at sampling points traitsByPoint <- summarize_traits_by_point( points_df = geoPoints, trait_df = traits, species_polygons = spRanges, trait_column = "RBL", species_name_col = "sci_name", continent = FALSE, parallel = FALSE ) # Run ecometric model ecoModel <- ecometric_model( points_df = traitsByPoint$points, env_var = "precip", transform_fun = function(x) log(x + 1), inv_transform_fun = function(x) exp(x) - 1, min_species = 3 ) # Reconstruct environments for fossil sites recon <- reconstruct_env( fossildata = fossils, model_out = ecoModel, match_nearest = TRUE, fossil_lon = "Long", fossil_lat = "Lat", modern_id = "ID", modern_lon = "Longitude", modern_lat = "Latitude" ) # Plot the ecometric trait–environment space ecometricPlot <- ecometric_space( model_out = ecoModel, env_name = "Precipitation (log mm)", fossil_data = recon ) # Display plot print(ecometricPlot)# Load internal data data("geoPoints", package = "commecometrics") data("traits", package = "commecometrics") data("spRanges", package = "commecometrics") data("fossils", package = "commecometrics") # Summarize trait values at sampling points traitsByPoint <- summarize_traits_by_point( points_df = geoPoints, trait_df = traits, species_polygons = spRanges, trait_column = "RBL", species_name_col = "sci_name", continent = FALSE, parallel = FALSE ) # Run ecometric model ecoModel <- ecometric_model( points_df = traitsByPoint$points, env_var = "precip", transform_fun = function(x) log(x + 1), inv_transform_fun = function(x) exp(x) - 1, min_species = 3 ) # Reconstruct environments for fossil sites recon <- reconstruct_env( fossildata = fossils, model_out = ecoModel, match_nearest = TRUE, fossil_lon = "Long", fossil_lat = "Lat", modern_id = "ID", modern_lon = "Longitude", modern_lat = "Latitude" ) # Plot the ecometric trait–environment space ecometricPlot <- ecometric_space( model_out = ecoModel, env_name = "Precipitation (log mm)", fossil_data = recon ) # Display plot print(ecometricPlot)
Visualizes the predicted ecometric space (predicted category) and probability maps
for each category based on the output from ecometric_model_qualitative().
ecometric_space_qual( model_out, palette = NULL, fossil_data = NULL, fossil_color = "#000000", modern_color = "#bc4749", x_label = "Summary metric 1", y_label = "Summary metric 2" )ecometric_space_qual( model_out, palette = NULL, fossil_data = NULL, fossil_color = "#000000", modern_color = "#bc4749", x_label = "Summary metric 1", y_label = "Summary metric 2" )
model_out |
Output from |
palette |
Optional color vector for categories (must match number of categories). |
fossil_data |
Optional. Output from |
fossil_color |
Outline color for fossil data bins (default = "#000000"). |
modern_color |
Outline color for modern data bins (default: |
x_label |
Label for the x-axis in the output plots (default: "Summary metric 1"). |
y_label |
Label for the y-axis in the output plots (default: "Summary metric 2"). |
A list containing:
ecometric_space_plot |
ggplot showing the predicted category across trait space. |
probability_maps |
List of ggplots showing probability surfaces across trait space for each category. |
# Load internal data data("geoPoints", package = "commecometrics") data("traits", package = "commecometrics") data("spRanges", package = "commecometrics") data("fossils", package = "commecometrics") # Summarize trait values at sampling points traitsByPoint <- summarize_traits_by_point( points_df = geoPoints, trait_df = traits, species_polygons = spRanges, trait_column = "RBL", species_name_col = "sci_name", continent = FALSE, parallel = FALSE ) # Run ecometric model for qualitative variable modelResult <- ecometric_model_qual( points_df = traitsByPoint$points, category_col = "vegetation", min_species = 3 ) # Reconstruct fossil environmental categories reconQual <- reconstruct_env_qual( fossildata = fossils, model_out = modelResult, match_nearest = TRUE, fossil_lon = "Long", fossil_lat = "Lat", modern_id = "ID", modern_lon = "Longitude", modern_lat = "Latitude" ) # Plot qualitative ecometric space ecoPlotQual <- ecometric_space_qual( model_out = modelResult, fossil_data = reconQual ) # Display predicted category map print(ecoPlotQual$ecometric_space_plot) # Display one of the probability maps print(ecoPlotQual$probability_maps[["1"]])# Load internal data data("geoPoints", package = "commecometrics") data("traits", package = "commecometrics") data("spRanges", package = "commecometrics") data("fossils", package = "commecometrics") # Summarize trait values at sampling points traitsByPoint <- summarize_traits_by_point( points_df = geoPoints, trait_df = traits, species_polygons = spRanges, trait_column = "RBL", species_name_col = "sci_name", continent = FALSE, parallel = FALSE ) # Run ecometric model for qualitative variable modelResult <- ecometric_model_qual( points_df = traitsByPoint$points, category_col = "vegetation", min_species = 3 ) # Reconstruct fossil environmental categories reconQual <- reconstruct_env_qual( fossildata = fossils, model_out = modelResult, match_nearest = TRUE, fossil_lon = "Long", fossil_lat = "Lat", modern_id = "ID", modern_lon = "Longitude", modern_lat = "Latitude" ) # Plot qualitative ecometric space ecoPlotQual <- ecometric_space_qual( model_out = modelResult, fossil_data = reconQual ) # Display predicted category map print(ecoPlotQual$ecometric_space_plot) # Display one of the probability maps print(ecoPlotQual$probability_maps[["1"]])
A dataset of fossil sites with estimated trait distribution and geographic coordinates, used to project past communities onto modern ecometric space.
fossilsfossils
A data frame with the following columns:
Unique identifier for the fossil community
Estimated mean of relative blade length for the fossil site
Estimated sd of relative blade length for the fossil site
Longitude coordinate (decimal degrees)
Latitude coordinate (decimal degrees)
Siciliano-Martina et al. (2024). Ecology and Evolution, 14(10), e70214.
A subset of 100 global sampling points with associated bioclimatic and vegetation variables.
All points overlap with species ranges in the spRanges dataset.
geoPointsgeoPoints
A data frame with the following columns:
Unique identifier for each point
Longitude coordinate (decimal degrees)
Latitude coordinate (decimal degrees)
Mean annual temperature (°C × 10)
Annual precipitation (mm)
Vegetation units (integer code)
Derived from Siciliano-Martina et al. (2024), filtered for overlap with IUCN polygons.
Creates an interactive map to verify species overlap at selected points.
inspect_point_species( traits_summary, point_ids = NULL, n_random = 10, lon_col = "Longitude", lat_col = "Latitude", ID_col = "ID", min_species_valid = 3, env_var = NULL )inspect_point_species( traits_summary, point_ids = NULL, n_random = 10, lon_col = "Longitude", lat_col = "Latitude", ID_col = "ID", min_species_valid = 3, env_var = NULL )
traits_summary |
A list output from |
point_ids |
Optional. A vector of specific point IDs to inspect. If NULL, selects |
n_random |
Number of random points to inspect if |
lon_col |
Name of the longitude column in |
lat_col |
Name of the latitude column in |
ID_col |
Name of the ID column in |
min_species_valid |
Minimum number of species with trait data to consider a point valid (default = 3). |
env_var |
Optional. Name of the environmental variable column in |
An interactive leaflet map showing selected points with species list popups.
# Load sample data from the package data("geoPoints", package = "commecometrics") data("traits", package = "commecometrics") data("spRanges", package = "commecometrics") # Summarize traits at points traitsByPoint <- summarize_traits_by_point( points_df = geoPoints, trait_df = traits, species_polygons = spRanges, trait_column = "RBL", species_name_col = "sci_name", continent = FALSE, parallel = FALSE ) # Visualize a random sample of 10 points inspect_point_species( traits_summary = traitsByPoint, n_random = 10, min_species_valid = 3 )# Load sample data from the package data("geoPoints", package = "commecometrics") data("traits", package = "commecometrics") data("spRanges", package = "commecometrics") # Summarize traits at points traitsByPoint <- summarize_traits_by_point( points_df = geoPoints, trait_df = traits, species_polygons = spRanges, trait_column = "RBL", species_name_col = "sci_name", continent = FALSE, parallel = FALSE ) # Visualize a random sample of 10 points inspect_point_species( traits_summary = traitsByPoint, n_random = 10, min_species_valid = 3 )
Calculates the optimal number of bins for a numeric vector based on Scott's rule. For more details see Scott (1979) https://doi.org/10.1093/biomet/66.3.605
optimal_bins(x)optimal_bins(x)
x |
Numeric vector. |
Integer representing the optimal number of bins.
# Simple example # Example with normally distributed data optimal_bins(rnorm(100))# Simple example # Example with normally distributed data optimal_bins(rnorm(100))
Uses fossil community trait summaries to reconstruct past environmental conditions by projecting them onto a binned ecometric trait space built from modern data. Optionally, it also assigns each fossil point to the nearest modern sampling site to retrieve observed environmental data.
reconstruct_env( fossildata, model_out, inv_transform = NULL, ci = 0.05, match_nearest = TRUE, fossil_lon = NULL, fossil_lat = NULL, modern_id = NULL, modern_lon = NULL, modern_lat = NULL, crs_proj = 4326 )reconstruct_env( fossildata, model_out, inv_transform = NULL, ci = 0.05, match_nearest = TRUE, fossil_lon = NULL, fossil_lat = NULL, modern_id = NULL, modern_lon = NULL, modern_lat = NULL, crs_proj = 4326 )
fossildata |
A data frame containing fossil trait summaries per fossil site.
Must include columns corresponding to the same two summary metrics used for modern communities,
using the column names specified by |
model_out |
Output list from |
inv_transform |
A function to back-transform environmental estimates to the original scale.
Default is |
ci |
The width of the interval to calculate around the maximum likelihood estimate (default = 0.05). |
match_nearest |
Logical; if TRUE, the function matches each fossil to its nearest modern point based on coordinates (default = TRUE). |
fossil_lon |
Name of the longitude column in |
fossil_lat |
Name of the latitude column in |
modern_id |
Name of the unique ID column in modern points (e.g., "GlobalID"). |
modern_lon |
Name of the longitude column in modern points. Required if |
modern_lat |
Name of the latitude column in modern points. Required if |
crs_proj |
Coordinate reference system to use when converting fossil and modern data to sf format (default = EPSG:4326). |
A data frame (fossildata) with reconstructed environmental values and optional nearest modern point data. Includes the following additional columns:
Numeric bin index for the first trait axis (based on first summary metric of trait distribution of fossil communities).
Numeric bin index for the second trait axis (based on second summary metric of trait distribution of fossil communities).
Maximum likelihood estimate of the environmental variable (on transformed scale if applicable).
Lower bound of the confidence interval around the environmental estimate (transformed scale).
Upper bound of the confidence interval around the environmental estimate (transformed scale).
(Optional) Inverse-transformed environmental estimate, on the original scale.
(Optional) Inverse-transformed lower bound of the confidence interval.
(Optional) Inverse-transformed upper bound of the confidence interval.
(Optional) ID of the nearest modern sampling point (if match_nearest = TRUE).
Additional columns from the matched modern site if match_nearest = TRUE (e.g., observed environmental values).
# Load internal data data("geoPoints", package = "commecometrics") data("traits", package = "commecometrics") data("spRanges", package = "commecometrics") data("fossils", package = "commecometrics") # Step 1: Summarize modern trait values at sampling points traitsByPoint <- summarize_traits_by_point( points_df = geoPoints, trait_df = traits, species_polygons = spRanges, trait_column = "RBL", species_name_col = "sci_name", continent = FALSE, parallel = FALSE ) # Step 2: Run an ecometric model with BIO12 (precipitation) ecoModel <- ecometric_model( points_df = traitsByPoint$points, env_var = "precip", transform_fun = function(x) log(x + 1), inv_transform_fun = function(x) exp(x) - 1, min_species = 3 ) # Step 3: Reconstruct fossil environments recon <- reconstruct_env( fossildata = fossils, model_out = ecoModel, match_nearest = TRUE, fossil_lon = "Long", fossil_lat = "Lat", modern_id = "ID", modern_lon = "Longitude", modern_lat = "Latitude" )# Load internal data data("geoPoints", package = "commecometrics") data("traits", package = "commecometrics") data("spRanges", package = "commecometrics") data("fossils", package = "commecometrics") # Step 1: Summarize modern trait values at sampling points traitsByPoint <- summarize_traits_by_point( points_df = geoPoints, trait_df = traits, species_polygons = spRanges, trait_column = "RBL", species_name_col = "sci_name", continent = FALSE, parallel = FALSE ) # Step 2: Run an ecometric model with BIO12 (precipitation) ecoModel <- ecometric_model( points_df = traitsByPoint$points, env_var = "precip", transform_fun = function(x) log(x + 1), inv_transform_fun = function(x) exp(x) - 1, min_species = 3 ) # Step 3: Reconstruct fossil environments recon <- reconstruct_env( fossildata = fossils, model_out = ecoModel, match_nearest = TRUE, fossil_lon = "Long", fossil_lat = "Lat", modern_id = "ID", modern_lon = "Longitude", modern_lat = "Latitude" )
Uses fossil community trait summaries to reconstruct the most likely environmental category by projecting them onto a qualitative ecometric space built from modern data. Optionally, it assigns each fossil point to the nearest modern sampling point.
reconstruct_env_qual( fossildata, model_out, match_nearest = TRUE, fossil_lon = NULL, fossil_lat = NULL, modern_id = NULL, modern_lon = NULL, modern_lat = NULL, crs_proj = 4326 )reconstruct_env_qual( fossildata, model_out, match_nearest = TRUE, fossil_lon = NULL, fossil_lat = NULL, modern_id = NULL, modern_lon = NULL, modern_lat = NULL, crs_proj = 4326 )
fossildata |
A data frame containing fossil trait summaries per fossil site.
Must include columns corresponding to the same two summary metrics used for modern communities,
using the column names specified by |
model_out |
Output list from |
match_nearest |
Logical; if TRUE, matches each fossil to the nearest modern point (default = TRUE). |
fossil_lon |
Name of the longitude column in |
fossil_lat |
Name of the latitude column in |
modern_id |
Name of the unique ID column in modern points (e.g., "GlobalID"). |
modern_lon |
Name of the longitude column in modern points. Required if |
modern_lat |
Name of the latitude column in modern points. Required if |
crs_proj |
Coordinate reference system to use when converting fossil and modern data to sf format (default = EPSG:4326) |
A data frame (fossildata) with reconstructed environmental values and optional nearest modern point data. Includes the following additional columns:
Numeric bin index for the first trait axis (based on first summary metric of trait distribution of fossil communities).
Numeric bin index for the second trait axis (based on second summary metric of trait distribution of fossil communities).
Predicted environmental category based on trait bin.
Probability of each environmental category for the assigned bin.
(Optional) ID of the nearest modern sampling point (if match_nearest = TRUE).
Additional columns from the matched modern site if match_nearest = TRUE.
# Load internal data data("geoPoints", package = "commecometrics") data("traits", package = "commecometrics") data("spRanges", package = "commecometrics") data("fossils", package = "commecometrics") # Step 1: Summarize trait values at sampling points traitsByPoint <- summarize_traits_by_point( points_df = geoPoints, trait_df = traits, species_polygons = spRanges, trait_column = "RBL", species_name_col = "sci_name", continent = FALSE, parallel = FALSE ) # Step 2: Run a qualitative ecometric model (e.g., land cover class) ecoModelQual <- ecometric_model_qual( points_df = traitsByPoint$points, category_col = "vegetation", min_species = 3 ) # Step 3: Reconstruct qualitative environments for fossil data reconQual <- reconstruct_env_qual( fossildata = fossils, model_out = ecoModelQual, match_nearest = TRUE, fossil_lon = "Long", fossil_lat = "Lat", modern_id = "ID", modern_lon = "Longitude", modern_lat = "Latitude" )# Load internal data data("geoPoints", package = "commecometrics") data("traits", package = "commecometrics") data("spRanges", package = "commecometrics") data("fossils", package = "commecometrics") # Step 1: Summarize trait values at sampling points traitsByPoint <- summarize_traits_by_point( points_df = geoPoints, trait_df = traits, species_polygons = spRanges, trait_column = "RBL", species_name_col = "sci_name", continent = FALSE, parallel = FALSE ) # Step 2: Run a qualitative ecometric model (e.g., land cover class) ecoModelQual <- ecometric_model_qual( points_df = traitsByPoint$points, category_col = "vegetation", min_species = 3 ) # Step 3: Reconstruct qualitative environments for fossil data reconQual <- reconstruct_env_qual( fossildata = fossils, model_out = ecoModelQual, match_nearest = TRUE, fossil_lon = "Long", fossil_lat = "Lat", modern_id = "ID", modern_lon = "Longitude", modern_lat = "Latitude" )
Evaluates how varying sample sizes affect the performance of ecometric models, focusing on two aspects:
Sensitivity (internal consistency): How accurately the model predicts environmental conditions on the same data on which it was trained.
Transferability (external applicability): How well the model performs on unseen data.
It tests different sample sizes by resampling the data multiple times (bootstrap iterations), training an ecometric model on each subset, and evaluating prediction error and correlation.
sensitivity_analysis( points_df, env_var, sample_sizes, iterations = 20, test_split = 0.2, grid_bins_1 = NULL, grid_bins_2 = NULL, transform_fun = NULL, parallel = TRUE, n_cores = parallel::detectCores() - 1 )sensitivity_analysis( points_df, env_var, sample_sizes, iterations = 20, test_split = 0.2, grid_bins_1 = NULL, grid_bins_2 = NULL, transform_fun = NULL, parallel = TRUE, n_cores = parallel::detectCores() - 1 )
points_df |
Output first element of the list from |
env_var |
Name of the environmental variable column in points_df (e.g., "precip"). |
sample_sizes |
Numeric vector specifying the number of communities (sampling points)
to evaluate in the sensitivity analysis. For each value, a random subset of the data of that
size is drawn without replacement and then split into training and testing sets using the
proportion defined by |
iterations |
Number of bootstrap iterations per sample size (default: 20). |
test_split |
Proportion of data to use for testing (default: 0.2). |
grid_bins_1 |
Number of bins for the first trait axis. If |
grid_bins_2 |
Number of bins for the second trait axis. If |
transform_fun |
Function to transform the environmental variable (default: NULL = no transformation). |
parallel |
Logical; whether to use parallel processing (default: TRUE). |
n_cores |
Number of cores to use for parallel processing (default: parallel::detectCores() - 1). |
Four base R plots are generated to visualize model performance as a function of sample size:
Training correlation vs. Sample size: Shows how well the model fits training data.
Testing correlation vs. Sample size: Shows generalizability to new data.
Training mean anomaly vs. Sample size: Shows average prediction error on training data.
Testing mean anomaly vs. Sample size: Shows average prediction error on test data.
Parallel processing is supported to speed up the analysis.
A list containing:
combined_results |
Raw iteration results as a data frame. Each row corresponds to one bootstrap iteration. |
summary_results |
Mean metrics across bootstrap iterations for each sample size. |
# Load internal data data("geoPoints", package = "commecometrics") data("traits", package = "commecometrics") data("spRanges", package = "commecometrics") # Summarize trait values at sampling points traitsByPoint <- summarize_traits_by_point( points_df = geoPoints, trait_df = traits, species_polygons = spRanges, trait_column = "RBL", species_name_col = "sci_name", continent = FALSE, parallel = FALSE ) # Run sensitivity analysis using annual precipitation sensitivityResults <- sensitivity_analysis( points_df = traitsByPoint$points, env_var = "precip", sample_sizes = seq(40, 90, 10), iterations = 5, transform_fun = function(x) log(x + 1), parallel = FALSE # Set to TRUE for faster performance on multicore machines ) # View results head(sensitivityResults$summary_results)# Load internal data data("geoPoints", package = "commecometrics") data("traits", package = "commecometrics") data("spRanges", package = "commecometrics") # Summarize trait values at sampling points traitsByPoint <- summarize_traits_by_point( points_df = geoPoints, trait_df = traits, species_polygons = spRanges, trait_column = "RBL", species_name_col = "sci_name", continent = FALSE, parallel = FALSE ) # Run sensitivity analysis using annual precipitation sensitivityResults <- sensitivity_analysis( points_df = traitsByPoint$points, env_var = "precip", sample_sizes = seq(40, 90, 10), iterations = 5, transform_fun = function(x) log(x + 1), parallel = FALSE # Set to TRUE for faster performance on multicore machines ) # View results head(sensitivityResults$summary_results)
Evaluates how varying sample sizes affect the performance of ecometric models, focusing on two aspects:
Sensitivity (internal consistency): How accurately the model predicts environmental conditions on the same data on which it was trained.
Transferability (external applicability): How well the model performs on unseen data.
It tests different sample sizes by resampling the data multiple times (bootstrap iterations), training an ecometric model on each subset, and evaluating prediction error and correlation.
sensitivity_analysis_qual( points_df, category_col, sample_sizes, iterations = 20, test_split = 0.2, grid_bins_1 = NULL, grid_bins_2 = NULL, parallel = TRUE, n_cores = parallel::detectCores() - 1 )sensitivity_analysis_qual( points_df, category_col, sample_sizes, iterations = 20, test_split = 0.2, grid_bins_1 = NULL, grid_bins_2 = NULL, parallel = TRUE, n_cores = parallel::detectCores() - 1 )
points_df |
Output first element of the list from |
category_col |
Name of the column containing the categorical trait. |
sample_sizes |
Numeric vector specifying the number of communities (sampling points)
to evaluate in the sensitivity analysis. For each value, a random subset of the data of that
size is drawn without replacement and then split into training and testing sets using the
proportion defined by |
iterations |
Number of bootstrap iterations per sample size (default = 20). |
test_split |
Proportion of data to use for testing (default = 0.2). |
grid_bins_1 |
Number of bins for the first trait axis. If |
grid_bins_2 |
Number of bins for the second trait axis. If |
parallel |
Logical; whether to run iterations in parallel (default = TRUE). |
n_cores |
Number of cores for parallelization (default = detectCores() - 1). |
Two plots are generated:
Training Accuracy vs. Sample size: Reflects internal model consistency.
Testing Accuracy vs. Sample size: Reflects external model performance.
Parallel processing is supported to speed up the analysis.
A list containing:
combined_results |
Raw iteration results as a data frame. Each row corresponds to one bootstrap iteration. |
summary_results |
Mean metrics across bootstrap iterations for each sample size. |
# Load internal data data("geoPoints", package = "commecometrics") data("traits", package = "commecometrics") data("spRanges", package = "commecometrics") # Summarize trait values at sampling points traitsByPoint <- summarize_traits_by_point( points_df = geoPoints, trait_df = traits, species_polygons = spRanges, trait_column = "RBL", species_name_col = "sci_name", continent = FALSE, parallel = FALSE ) # Run sensitivity analysis for dominant land cover class sensitivityQual <- sensitivity_analysis_qual( points_df = traitsByPoint$points, category_col = "vegetation", sample_sizes = seq(40, 90, 10), iterations = 5, parallel = FALSE ) # View results head(sensitivityQual$summary_results)# Load internal data data("geoPoints", package = "commecometrics") data("traits", package = "commecometrics") data("spRanges", package = "commecometrics") # Summarize trait values at sampling points traitsByPoint <- summarize_traits_by_point( points_df = geoPoints, trait_df = traits, species_polygons = spRanges, trait_column = "RBL", species_name_col = "sci_name", continent = FALSE, parallel = FALSE ) # Run sensitivity analysis for dominant land cover class sensitivityQual <- sensitivity_analysis_qual( points_df = traitsByPoint$points, category_col = "vegetation", sample_sizes = seq(40, 90, 10), iterations = 5, parallel = FALSE ) # View results head(sensitivityQual$summary_results)
A spatial dataset of species range polygons matching the species in the traits dataset.
spRangesspRanges
An sf object with the following columns:
Species name (matching the traits table)
Polygon geometry representing species distribution
Download from the IUCN Red List webpage (IUCN, 2025).
For each spatial sampling point, this function calculates two metrics specified by the user of a trait across all overlapping species polygons, and calculates richness. Optionally, it assigns each point to a continent using Natural Earth data.
summarize_traits_by_point( points_df, trait_df, species_polygons, comm_metric_1 = function(x) mean(x, na.rm = TRUE), comm_metric_2 = function(x) sd(x, na.rm = TRUE), trait_column = "trait_name", species_name_col = "sci_name", continent = FALSE, lon_col = "Longitude", lat_col = "Latitude", parallel = TRUE, n_cores = parallel::detectCores() - 1 )summarize_traits_by_point( points_df, trait_df, species_polygons, comm_metric_1 = function(x) mean(x, na.rm = TRUE), comm_metric_2 = function(x) sd(x, na.rm = TRUE), trait_column = "trait_name", species_name_col = "sci_name", continent = FALSE, lon_col = "Longitude", lat_col = "Latitude", parallel = TRUE, n_cores = parallel::detectCores() - 1 )
points_df |
A data frame containing sampling points with columns for longitude and latitude. |
trait_df |
A data frame of trait data. Must include a column for species names ('TaxonName') and the trait of interest (default = "trait_name"). |
species_polygons |
An |
comm_metric_1 |
A function used to summarize the trait values across overlapping species.
Defaults to |
comm_metric_2 |
A second function used to summarize trait values.
Defaults to |
trait_column |
The name of the trait column in |
species_name_col |
The name of the column in |
continent |
Logical. If |
lon_col |
Name of the longitude column in |
lat_col |
Name of the latitude column in |
parallel |
Logical; whether to parallelize the summarization step (default TRUE). |
n_cores |
Number of cores to use if parallelizing (default: detectCores() - 1). |
A list with two elements:
A data frame identical to points_df but with additional columns:
Result of applying metric_1 to the trait values of overlapping species (e.g., mean, max, median).
Result of applying metric_2 to the trait values of overlapping species (e.g., standard deviation, range).
Number of species overlapping the point (regardless of trait availability).
Number of species with non-missing trait values at the point.
(Optional) Continent name assigned from Natural Earth data, if continent = TRUE.
A list of character vectors, each containing the names of species whose distribution polygons overlap a given sampling point.
# Load sample data from the package data("geoPoints", package = "commecometrics") data("traits", package = "commecometrics") data("spRanges", package = "commecometrics") traitsByPoint <- summarize_traits_by_point( points_df = geoPoints, trait_df = traits, species_polygons = spRanges, trait_column = "RBL", species_name_col = "sci_name", continent = FALSE, parallel = FALSE ) head(traitsByPoint$points)# Load sample data from the package data("geoPoints", package = "commecometrics") data("traits", package = "commecometrics") data("spRanges", package = "commecometrics") traitsByPoint <- summarize_traits_by_point( points_df = geoPoints, trait_df = traits, species_polygons = spRanges, trait_column = "RBL", species_name_col = "sci_name", continent = FALSE, parallel = FALSE ) head(traitsByPoint$points)
A dataset of relative blade length (RBL) values for five species in the order Carnivora.
These species match those in the spRanges dataset.
traitstraits
A data frame with the following columns:
Species name (binomial)
Relative blade length (unitless ratio)
Siciliano-Martina et al. (2024). Ecology and Evolution, 14(10), e70214.