Integrating Research Data into Official Statistics
1. Outcome
This Circular provides guidance on bridging the gap between scientific research data and official statistics for the compilation of ocean accounts. Ocean accounting requires diverse data sources that extend beyond traditional statistical surveys and administrative records to include oceanographic observations, biodiversity monitoring, ecosystem assessments, and other forms of research data generated by universities, research institutions, and international scientific programmes. This Circular addresses the systematic integration of research data into statistical production, maintaining the quality standards expected of official statistics while leveraging the unique capabilities of scientific research organisations.
By implementing the guidance in this Circular, practitioners will be able to identify research data sources relevant to ocean accounting applications, assess their fitness for statistical purposes using quality frameworks adapted from the United Nations National Quality Assurance Framework (UN NQAF), establish institutional partnerships with research organisations, and compile ocean accounts that combine traditional statistical sources with research data in a transparent and methodologically sound manner. The specific applications addressed include the use of oceanographic research data for ecosystem condition accounts (see TG-3.5 Ecosystem Condition), stock assessment data for fisheries accounts (see TG-6.7 Fisheries Stock Assessment), and biodiversity monitoring data for extent accounts (see TG-2.9 Ecosystem Extent).
The quality assessment dimensions presented here build on the overarching quality framework established in TG-0.7 Quality Assurance, while the data harmonisation techniques needed to reconcile research data with other sources are detailed in TG-4.6 Data Harmonisation. Key terms used in this Circular are defined in TG-0.6 Glossary.
2. Requirements
This Circular requires familiarity with:
-
TG-0.1 General Introduction to Ocean Accounts—provides foundational understanding of ocean accounts components and the relationship between environmental and economic accounting frameworks, including the conceptual basis for integrating diverse data sources into a coherent accounting system.
-
TG-0.7 Quality Assurance—establishes the overarching quality framework applicable to ocean accounting data, including the quality dimensions and assessment procedures that this Circular applies specifically to research data sources.
3. Guidance Material
The compilation of ocean accounts frequently requires data from scientific research programmes that were not originally designed for statistical purposes. This presents both opportunities and challenges. Research data can fill critical gaps in ocean observation and ecosystem monitoring that traditional statistical sources cannot address. However, integrating such data into official statistics requires careful attention to quality assessment, metadata documentation, and institutional coordination. This Circular provides a systematic approach to navigating these challenges.
The quality considerations discussed here should be understood within the broader quality assurance framework described in TG-0.7 Quality Assurance. For guidance on reconciling data from multiple sources with different classifications and spatial boundaries, see TG-4.6 Data Harmonisation.
3.1 Decision Use Cases for Research Data
Research data integration supports specific decision-making applications in ocean accounting. This section identifies the primary use cases where research data sources provide essential inputs that traditional statistical sources cannot supply.
3.1.1 Integration modes and governance accountability
Before identifying specific use cases, practitioners should classify each prospective integration of research data into one of three modes, because each mode triggers different governance requirements within the NSO. This classification is material to the NSO's legal and audit accountability, since substituting an official source with research data may require formal endorsement from the national statistical authority rather than a technical decision by the compiling unit.
Mode A—Gap-filling. No official statistical source exists for the required variable. Research data are the only feasible input. Examples include subsurface dissolved oxygen profiles in pelagic waters, eDNA-based species occurrence in remote habitats, and acoustic biomass estimates for non-commercial species. Governance: technical judgement by the compiling unit, documented in account metadata under the standard quality-assessment workflow set out in Section 3.5.
Mode B—Supplementary use. An official source exists but research data are used in parallel as a secondary validation source, a cross-check, or a higher-resolution overlay. Examples include using satellite chlorophyll products alongside ship-based water-quality monitoring, or using research stock assessments to validate administrative catch records. Governance: technical judgement by the compiling unit; coherence test required under Section 3.5.3 Step 3.3.
Mode C—Primary substitution. An official source exists but is replaced by research data judged of higher quality. Examples include replacing a discontinued shipboard nutrient monitoring programme with an autonomous Argo-derived product, or substituting an obsolete remote-sensing land-cover product with a research-grade national mapping. Governance: requires chief statistician (or equivalent national statistical authority) endorsement, plus a documented UN NQAF Level A institutional-framework compliance check, before the substituted source can be published in an official account. The substitution decision and its endorsement must be recorded in the account compilation metadata.
The remaining sub-sections of Section 3.1 identify the principal use cases. The mode applicable to a given source should be determined by the compiling NSO when applying the Section 3.5 procedure.
Figure TG-4.5-F1. Three sequential admissibility gates (Q1--Q3) screen a research dataset for variable coverage, metadata completeness, and quality threshold compliance; mode routing gate (Q4) then classifies the admissible source as Mode A (gap-filling, no official source exists), Mode B (supplementary, official source retained), or Mode C (primary substitution)—Mode C requires chief statistician endorsement before publication in official accounts.
3.1.2 Ecosystem condition accounts
Ecosystem condition accounts require measurements of biophysical and chemical variables that characterise the state of marine ecosystems[1]. For ocean environments, many of these measurements are only available through oceanographic research programmes:
Oceanographic surveys provide temperature, salinity, dissolved oxygen, nutrient concentrations, and pH measurements throughout the water column. The Global Ocean Observing System (GOOS) coordinates international efforts to standardise observation methods and improve data sharing[2]. For condition accounts in pelagic waters (see TG-6.5 Pelagic and Open Ocean Accounting), research vessel surveys and autonomous profiling floats (such as the Argo network) are the primary sources for subsurface condition data.
Acoustic surveys estimate biomass of schooling pelagic fish and benthic invertebrates using scientific echosounders. Regional fisheries management organisations (RFMOs) often conduct acoustic surveys to assess stock condition, providing data that can be integrated into ecosystem condition accounts where fish biomass serves as a condition indicator[3].
Water quality monitoring by environmental research agencies tracks pollution levels, turbidity, and harmful algal blooms. For coastal condition accounts, research monitoring of nitrogen and phosphorus loading provides essential data on eutrophication pressure.
The SEEA Ecosystem Accounting framework identifies specific condition characteristics that typically require research data inputs, including biotic characteristics (species diversity, biomass, community composition), abiotic characteristics (water temperature, pH, salinity), and functional characteristics (primary productivity, nutrient cycling)[4].
3.1.3 Fisheries stock assessment
Stock assessment for commercial and subsistence fisheries depends heavily on scientific research data[5]:
Catch-at-age data from research vessel surveys provide independent estimates of population abundance and age structure, complementing catch data reported by commercial vessels. For highly migratory species such as tuna, international research programmes coordinated through RFMOs are often the only source of systematic catch-at-age information across the species' range.
Tagging programmes using electronic tags, acoustic telemetry, and genetic markers reveal migration patterns, population structure, and survival rates. Close-kin mark-recapture methods using genetic analysis allow estimation of absolute abundance for high-value species such as southern bluefin tuna, where traditional survey methods are impractical[6].
Life history parameters including growth rates, natural mortality, and fecundity are typically derived from research laboratory studies and field sampling programmes rather than routine statistical collection.
National Statistical Offices (NSOs) compiling fisheries accounts rely on stock assessments produced by fisheries research institutes to estimate the physical and monetary value of aquatic resources, as described in TG-3.1 Asset Accounts.
Data-rich versus data-poor stock assessments. NSOs must distinguish between data-rich stock assessment methods (e.g., age-structured Virtual Population Analysis, integrated statistical catch-at-age models)—which combine catch-at-age, abundance indices, and life-history parameters to produce biomass estimates with characterisable confidence intervals—and data-poor methods (e.g., Length-Based Spawning Potential Ratio (LBSPR), Catch-Maximum Sustainable Yield (CMSY), Depletion-Based Stock Reduction Analysis (DB-SRA))—which rely on far fewer inputs and produce estimates with much wider, often order-of-magnitude uncertainty ranges[7]. The quality tier reported alongside fisheries asset accounts (Tier 1, 2, or 3 in SEEA EA terminology) must reflect the underlying assessment method: data-rich age-structured assessments typically support Tier 3 reporting, while data-poor length-based or catch-only methods should generally be reported as Tier 1, with the methodological caveats made explicit in the account release notes. Publishing a fisheries asset account derived from a data-poor assessment without clearly communicating this limitation can mislead policy audiences about the precision of the stock estimate.
3.1.4 Biodiversity and extent mapping
Research programmes provide essential data for ecosystem extent accounts and biodiversity indicators:
Benthic habitat mapping using multibeam echosounder surveys, remotely operated vehicles (ROVs), and drop cameras classifies seabed substrates and identifies ecosystem types such as cold-water coral reefs, sponge beds, and seagrass meadows. National hydrographic offices and marine research institutes are typically the custodians of these data.
Species occurrence records from biodiversity surveys, museum collections, and citizen science programmes are aggregated in global repositories such as the Ocean Biodiversity Information System (OBIS) and the Global Biodiversity Information Facility (GBIF)[8]. For ecosystem extent accounts based on the IUCN Global Ecosystem Typology, these occurrence data support delineation of ecosystem functional groups.
Coral reef monitoring through the Global Coral Reef Monitoring Network and regional programmes such as the Coral Triangle Initiative provides systematic assessments of coral cover, bleaching events, and reef condition that underpin extent and condition accounts for coral reef ecosystems (see TG-6.1 Coral Reef Ecosystem Accounting).
The Framework for the Development of Environment Statistics (FDES) notes that "scientific research data can be used to address data gaps" in environmental statistics, particularly for parameters that require specialised measurement techniques[9].
3.2 Types of Research Data
Research data relevant to ocean accounting encompasses a diverse range of sources, each with distinct characteristics, collection methodologies, and quality considerations. Understanding these distinctions is essential for identifying appropriate data sources and assessing their fitness for accounting purposes.
3.2.1 Oceanographic observation data
Oceanographic observation systems generate continuous or periodic measurements of physical, chemical, and biological ocean parameters[10]. These include temperature, salinity, currents, dissolved oxygen, nutrient concentrations, and chlorophyll levels. Major international programmes such as the Global Ocean Observing System (GOOS), the Argo float network, and regional ocean observing systems (e.g., the Integrated Marine Observing System in Australia) provide standardised data streams that can support ecosystem condition accounts[11]. The Intergovernmental Oceanographic Commission (IOC) of UNESCO coordinates international efforts to improve ocean observation capacity and data sharing[12].
Such data are typically collected using instrumented platforms including research vessels, moored buoys, autonomous underwater vehicles, satellite remote sensing, and profiling floats. The primary advantages include: systematic temporal coverage enabling trend analysis; standardised measurement protocols developed through international scientific consensus; and increasingly open access through global data repositories. However, spatial coverage can be uneven, with data density varying significantly between coastal and open ocean areas, and between developed and developing country waters[13].
Among the most directly relevant oceanographic variables for ecosystem condition accounts are the Essential Ocean Variables (EOVs) defined by GOOS. The EOV framework organises ocean observations into physics, biogeochemistry, and biology/ecosystems domains[14]. For ecosystem condition accounts, priority EOVs include sea surface temperature, dissolved oxygen, inorganic carbon, ocean colour (as a proxy for phytoplankton biomass), and marine habitat properties. For ecosystem extent accounts, relevant EOVs include hard coral cover, seagrass cover, mangrove cover, and macroalgal canopy cover. Practitioners should consult the current GOOS EOV specification sheets, which document readiness levels, observation requirements, and data product availability for each variable, to determine which EOVs are feasible data sources for their national accounting context.
3.2.2 Biodiversity survey data
Biodiversity surveys document species occurrence, abundance, and distribution through structured sampling programmes[15]. For marine environments, these include fish stock assessments, invertebrate surveys, marine mammal and seabird censuses, coral reef monitoring, and seagrass mapping exercises. Such data are fundamental for ecosystem extent accounts (mapping ecosystem types), ecosystem condition accounts (assessing biodiversity indicators), and ecosystem services accounts (quantifying provisioning services such as fisheries).
Biodiversity data are collected through diverse methodologies including visual census, acoustic surveys, environmental DNA (eDNA) sampling, and citizen science programmes[16]. The Global Biodiversity Information Facility (GBIF) aggregates species occurrence records from research institutions worldwide and provides standardised access through its data portal[17]. Regional initiatives such as the Ocean Biodiversity Information System (OBIS) focus specifically on marine biodiversity data[18].
A key consideration is that biodiversity surveys often follow sampling designs optimised for scientific research questions rather than comprehensive spatial coverage. This can result in sampling bias towards accessible locations or areas of particular scientific interest, which must be addressed when using such data for area-based ecosystem accounts[19].
3.2.3 Ecosystem monitoring programmes
Long-term ecosystem monitoring programmes track changes in ecosystem structure, function, and condition over time. Examples relevant to ocean accounting include coral reef monitoring networks (e.g., the Global Coral Reef Monitoring Network), mangrove forest assessments, seagrass habitat mapping, and kelp forest monitoring programmes[20]. These programmes typically combine remote sensing data with ground-truthing surveys to map ecosystem extent and assess condition indicators.
The SEEA Ecosystem Accounting framework recommends using a tiered approach for ecosystem monitoring, with Tier 1 using globally available default data, Tier 2 using regionally appropriate data, and Tier 3 using nationally collected data with full spatial and temporal coverage[21]. Research monitoring programmes often provide the foundation for Tier 2 and Tier 3 approaches, particularly for marine ecosystems where routine statistical collection is limited.
3.2.4 Remote sensing and Earth observation data
Satellite-based Earth observation provides systematic, repeated coverage of ocean and coastal areas at scales relevant to national accounting[22]. Key parameters measurable from space include sea surface temperature, ocean colour (indicative of chlorophyll and primary productivity), sea level, surface currents, coastal land cover change, and wetland extent. The European Union's Copernicus Marine Service and NASA's Ocean Biology Processing Group provide validated ocean data products derived from multiple satellite sensors[23].
Remote sensing data are particularly valuable for their temporal frequency (enabling detection of change) and spatial comprehensiveness (enabling complete national coverage). However, limitations include cloud cover interference, the inability to observe below the sea surface, and the requirement for ground-truthing to validate derived products. The SEEA Technical Guidance on Biophysical Modelling notes that remote sensing offers "enormous opportunities to disseminate data with very short time-lags and high-frequency"[24].
The major satellite platforms relevant to ocean accounting—Sentinel-2, Landsat, MODIS, and Sentinel-1 SAR—span spatial resolutions from 10 metres to 1 kilometre, with revisit times from daily (MODIS) to 16 days (Landsat). For comprehensive guidance on sensor characteristics and selection, see TG-4.1 Remote Sensing and Geospatial Data.[25]
Atmospheric correction validation in coastal and optically complex waters. Standard atmospheric correction algorithms used for open-ocean ("Case 1") waters—including DARK_PIXEL, SeaDAS default, and the Case-2 Regional CoastColour processor (C2RCC)—are known to perform poorly in turbid, shallow, sediment-laden, or chlorophyll-rich coastal ("Case 2") waters. The systematic biases this introduces propagate directly into derived products such as chlorophyll-a, suspended sediment, and turbidity, which are the principal remote sensing inputs to coastal condition accounts. For NSOs in tropical and developing country contexts—where Case 2 waters dominate the coastal zone—in situ validation of satellite-derived products is mandatory before use in official accounts. Practitioners should: (i) confirm that the product is derived from a Case 2-capable processor and consult the Copernicus Marine Service product quality factsheet for the specific product version; (ii) verify validation status against EUMETSAT/ACRI atmospheric correction validation protocols; and (iii) where feasible, perform a matchup analysis against in situ radiometric measurements from the AERONET Ocean Color network (AERONET-OC) or a comparable national field campaign. Uncorrected or unvalidated products should not be published as the basis of official condition statistics for coastal waters.
3.2.5 Scientific research publications and datasets
Peer-reviewed scientific publications represent an important source of coefficients, conversion factors, and methodological parameters for ecosystem service modelling[26]. For example, estimates of carbon sequestration rates in mangroves, nutrient retention by seagrass meadows, or coastal protection values from coral reefs are frequently derived from published research rather than direct national measurement. While individual studies may be site-specific, synthesis studies and meta-analyses can provide generalised values applicable across similar ecosystem types.
Research datasets underlying publications are increasingly required to be deposited in open repositories as a condition of publication or funding. The Framework for the Development of Environment Statistics (FDES) notes that scientific research data "are usually available at no or low cost" and "can be used to address data gaps"[27]. However, the FDES also cautions that such data "often use terms and definitions that differ from those used in statistics", may have "limited scope", and are "often available on a one-time basis only"[28].
Coefficient selection protocol where multiple published estimates exist. It is common—especially for developing country ocean accounts compiling blue carbon, nutrient retention, or coastal protection coefficients—to encounter multiple published estimates for the same parameter that span an order of magnitude or more. Mangrove soil carbon sequestration estimates, for example, span roughly 0.5 to 15 Mg C ha⁻¹ yr⁻¹ in the published literature, with central tendency varying by biogeography, stand age, and methodology. Where multiple estimates exist, NSOs should follow this four-step coefficient selection protocol:
- Prefer systematic reviews and meta-analyses over individual studies. Where a published systematic review or meta-analysis exists for the parameter of interest, use its central estimate as the default value rather than averaging across individual studies. Meta-analyses adjust for study quality and weight estimates by sample size.
- Where multiple meta-analyses exist, document the range and select the most geographically and ecologically appropriate value. Geographic and ecological matching (e.g., tropical vs. temperate mangroves; high-rainfall vs. arid coastal settings) should be explicit and justified in writing. A simple arithmetic mean of meta-analyses with different geographic scopes is not recommended.
- Apply a sensitivity analysis across the plausible range. Re-compute the affected account aggregate using at least the lower and upper bounds of the range of credible estimates, and report the resulting account values as a sensitivity envelope alongside the central estimate. This propagates coefficient uncertainty into the account.
- Document the selection rationale in account metadata. The selected coefficient, its source, the alternatives considered and rejected, the geographic-matching justification, and the sensitivity envelope must be recorded in the account compilation metadata to support audit and replication.
Where SEEA EA biome-specific default values are available, these should be used as the baseline starting point for the protocol and any departure justified.
3.3 Quality Assessment Frameworks
Research data must be assessed for fitness for statistical purposes before incorporation into ocean accounts. The quality dimensions applied to research data differ in emphasis from those applied to survey data, reflecting the distinct characteristics of scientific data production. For the overarching quality framework applicable to all ocean accounting data, see TG-0.7 Quality Assurance.
3.3.1 Dimensions of data quality
The United Nations National Quality Assurance Framework (UN NQAF) identifies quality dimensions applicable to official statistics including relevance, accuracy, reliability, timeliness, punctuality, accessibility, clarity, coherence, and comparability[29]. When evaluating research data for ocean accounting, particular attention should be given to:
Relevance—The degree to which data meet the needs of users. For ocean accounting, relevance assessment should consider whether the research data address the specific ecosystem components, condition characteristics, or service flows required by the account structure. The UN NQAF notes that "relevance is concerned with whether the available statistics meet the needs of users" and that "assessing relevance is a subjective matter dependent upon the varying needs of users"[30]. For research data, relevance questions include: Does the spatial coverage align with the accounting area? Does the temporal resolution match the accounting period? Are the measured variables directly usable or do they require transformation?
Accuracy and reliability—The degree to which data correctly measure the phenomena they are designed to measure. For research data, this assessment should consider sampling design adequacy, measurement methodology validation, error propagation in derived products, and replication of results[31]. The SEEA Technical Guidance on Biophysical Modelling provides detailed guidance on accuracy assessment for modelled data, including approaches for validating look-up tables, process-based models, and machine learning outputs[32].
Coherence—The degree to which data can be reliably combined with other data from different sources. Research data often use classifications, definitions, and spatial boundaries that differ from statistical standards. Achieving coherence requires mapping research classifications to standard statistical classifications (such as the IUCN Global Ecosystem Typology for ecosystem types) and reconciling spatial units[33]. See TG-4.6 Data Harmonisation for detailed guidance on harmonisation approaches.
Comparability—The degree to which data are comparable over time and across space. Scientific monitoring programmes may change methodologies as measurement technology improves, creating breaks in time series. Documentation of methodological changes and development of bridging factors may be required to maintain temporal comparability[34].
Accessibility—The ease with which data can be obtained and used. The UN NQAF emphasises that "accessibility refers to the physical conditions under which users can obtain data" and includes "the ease with which the existence of information can be ascertained, as well as the suitability of the form or medium through which the information can be accessed"[35]. For research data, accessibility considerations include licensing restrictions, data format compatibility, and the existence of documented data access protocols. The FAIR principles, defined in Section 3.4.1, provide a complementary framework for this dimension.
Minimum acceptability thresholds. Identifying the relevant dimensions is necessary but not sufficient: NSOs require a decision rule for when a research dataset can be admitted to an official account. Table 3.3.1 specifies, for each UN NQAF dimension, the conditions under which a data source is (a) acceptable without qualification, (b) acceptable with documented caveats, or (c) not acceptable for use in official accounts. The thresholds are anchored to international standards where these exist (e.g., the IOC/GOOS ±0.2 mg/L accuracy precedent for dissolved oxygen) and otherwise expressed as documentation requirements that allow defensible compilation. The ±0.2 mg/L figure is used here as an exemplar of how a published international standard can be carried directly into the integration decision; equivalent dimension-specific thresholds should be substituted where they exist for other variables.
Table 3.3.1: Minimum acceptability thresholds by UN NQAF quality dimension
| Dimension | Acceptable without qualification | Acceptable with documented caveats | Not acceptable |
|---|---|---|---|
| Relevance | Spatial coverage, temporal resolution, and measured variables directly match account requirement | Partial match; transformation or scaling required and documented | Variable is conceptually misaligned with the account; no defensible transformation |
| Accuracy | Validated against an international standard (e.g., IOC/GOOS ±0.2 mg/L for DO); error budget published | Accuracy estimate available but not against a recognised standard; uncertainty disclosed in account release | No accuracy assessment available and no alternative validation feasible |
| Reliability | Documented sampling design, repeated comparable surveys, published precision | Single-event survey with documented methods; precision inferred from method | Methods undocumented; no precision basis |
| Coherence | Classification crosswalk to SEEA/IUCN GET exists; co-located comparison available | Mapping requires interpretive judgement; documented and reviewed | Categories not mappable to statistical classifications |
| Comparability | Stable methodology across full time series; no breaks | Documented methodological change with bridging factors applied | Undocumented method change creating an unresolvable break |
| Accessibility | Open licence, persistent identifier, machine-readable format | Restricted licence with documented terms; access via MoU | No defined access path; licence prohibits statistical use |
| Timeliness | Update schedule compatible with the accounting cycle | Lag exceeds accounting cycle by no more than one period; documented | Lag exceeds two accounting periods with no commitment to update |
Note on Mode C (primary substitution) entries: Where the integration mode is Mode C under Section 3.1.1 (i.e., a research dataset replaces an existing official source), all threshold assessments in Table 3.3.1 remain applicable, but the overall admission decision additionally requires chief-statistician (or equivalent national statistical authority) endorsement before the source can be published in an official account. Mode C sources that pass all Table 3.3.1 thresholds but lack this endorsement may not be published.
A data source rated "Not acceptable" on any compilation-blocking dimension (accuracy, coherence, or accessibility) cannot be admitted to an official account without remediation. Sources rated "Acceptable with caveats" may be admitted provided the caveats are recorded in account metadata and surfaced in the account release.
3.3.2 Reproducibility, replicability, and scientific rigour
Reproducibility—obtaining consistent results using the same data and methods[36]—is essential for updatable ocean accounts. Replicability—obtaining consistent results when new data are collected using the same methods[37]—provides confidence that coefficients derived in one context can be applied in another. The SEEA Technical Guidance on Biophysical Modelling emphasises that "transparency of approaches is essential".[38]
Three-level reproducibility standard for integration. To convert reproducibility from an aspirational requirement into an auditable gate, NSOs should apply the following three-level standard when assessing research data:
- Level 1—Minimum acceptable for integration of empirical (non-model) data. Methods are fully described in a peer-reviewed publication or equivalent technical report. Raw inputs and processing code need not be publicly archived, but the methods must be sufficient to allow a competent practitioner to reproduce the analysis given the inputs.
- Level 2—Preferred standard, and the minimum required for any model-based output (e.g., biophysical model results, modelled ecosystem service values, statistically interpolated condition indices). Processing code and processed data are archived in a public repository (e.g., Zenodo, GitHub) with documented inputs and a usable readme. Cited datasets carry persistent identifiers.
- Level 3—Highest standard. All of Level 2, plus archived raw inputs and a containerised or fully scripted processing environment (e.g., Docker, Singularity, or a versioned conda environment) that enables byte-exact reproduction of outputs.
Legacy empirical datasets that pre-date open-data mandates may be integrated under Level 1, provided their methods are documented in peer-reviewed publications. Outputs of biophysical or statistical models must meet at least Level 2 before being admitted to an official account; this is the level at which audit exposure is highest because the result depends on assumptions and code rather than direct measurement.
3.3.3 Uncertainty quantification
Research data are characterised by various sources of uncertainty that must be documented and, where possible, quantified. Uncertainty can arise from sampling variability, measurement error, model parameter uncertainty, and structural model uncertainty. The SEEA Technical Guidance on Biophysical Modelling notes that "uncertainty matrices, which outline possible sources of uncertainty for each model" provide a basic approach to uncertainty documentation[39].
For ecosystem accounts derived from biophysical models, "outputs should be seen as best estimates, rather than absolute values" unless detailed parameterisation and validation has been conducted[40]. Statistical agencies should communicate uncertainty alongside point estimates, enabling users to assess fitness for their specific purposes. The tiered approach recommended in SEEA EA supports this, with lower tiers acknowledged to have greater uncertainty but serving valuable purposes for awareness-raising and broad trend analysis[41].
The UN NQAF recommends that "statistical agencies should publish information on the quality of the statistics they compile and disseminate" and that "quality information should include measures of accuracy"[42]. For research data integrated into ocean accounts, this translates to publishing confidence intervals, standard errors, or qualitative uncertainty assessments alongside the data values used in the accounts.
Combining uncertainty components—methods note. Where an account value carries multiple uncertainty components—typically measurement uncertainty, spatial interpolation uncertainty, and model parameter uncertainty—the components must be combined into a single published uncertainty using a method that is both transparent and methodologically defensible. NSOs should apply the following:
- Quadrature combination (root sum of squares) for uncertainty components that are statistically independent and uncorrelated. If $u_1, u_2, \ldots, u_n$ are the independent component uncertainties (expressed as standard uncertainties), the combined uncertainty is $u_c = \sqrt{u_1^2 + u_2^2 + \cdots + u_n^2}$. This is the method specified by the Joint Committee for Guides in Metrology (JCGM 100:2008—Guide to the Expression of Uncertainty in Measurement, "GUM"), the international standard for measurement uncertainty propagation, and should be the default for ocean accounts.
- Arithmetic addition ($u_c = u_1 + u_2 + \cdots + u_n$) as a conservative upper bound where the correlation structure between components is unknown or where components are believed to be positively correlated. This approach overstates uncertainty for independent components and should be clearly identified as a conservative bound when used.
Structural model uncertainty (also called scenario uncertainty or model-form uncertainty) arises when the mathematical structure of a model may be wrong—not just its parameter values. Because structural uncertainty has no standard probabilistic form, it cannot be combined in quadrature with measurement or parameter uncertainties. It must instead be reported as a separate qualitative statement or as a scenario-based range (e.g., low/central/high scenario outputs from alternative model structures). This separate reporting requirement applies whenever biophysical model outputs are used as primary inputs to ocean accounts. IPCC AR6 Working Group I uncertainty guidance and JCGM 100:2008 Annex notes on model-form uncertainty provide relevant precedent. The chosen combination method, the components combined, and the assumed correlation structure must be documented alongside the account release. Where uncertainty components have been combined arithmetically because correlation structure is unknown, future work to characterise the correlation structure and move to a GUM-aligned quadrature combination should be flagged in the account quality report.
3.4 Metadata Standards
Comprehensive metadata documentation is essential for integrating research data into statistical systems. Metadata enable data discovery, support quality assessment, and provide the documentation required for reproducible compilation of accounts.
3.4.1 FAIR principles for research data
The FAIR Guiding Principles (Findable, Accessible, Interoperable, Reusable) provide the overarching data management framework that underpins research-data integration. Full definitions and practical requirements for each principle are provided in TG-4.6 Data Harmonisation and Interoperability §3.3. NSOs should prioritise research data sources that adhere to FAIR principles and should advocate for FAIR practices when negotiating data sharing arrangements with research organisations, since non-FAIR research datasets impose additional metadata remediation work at Phase 2 of the compilation procedure (Section 3.5.2).[43]
3.4.2 Domain-specific metadata standards
Several domain-specific metadata standards are relevant for ocean accounting:
ISO 19115 Geographic Information - Metadata is the international standard for geospatial metadata. For spatially-referenced ocean research data, ISO 19115-compliant metadata should be a requirement—it is the target full standard in the Phase 2 metadata alignment procedure (Section 3.5.2). For a complete description of ISO 19115 and its XML encoding (ISO 19139), see TG-4.6 Data Harmonisation and Interoperability §3.1 and §3.6.[44]
Darwin Core is a metadata standard for biodiversity data, providing terms for describing species occurrence records[45]. Darwin Core is the standard used by GBIF and OBIS for aggregating biodiversity observations from distributed sources. Ocean accounting projects drawing on biodiversity survey data should ensure compatibility with Darwin Core terms.
Climate and Forecast (CF) Conventions provide standards for describing climate and forecast data, particularly for gridded data in NetCDF format[46]. CF conventions are widely used in oceanography for describing variables, coordinates, and attributes of ocean model outputs and observational products.
SDMX (Statistical Data and Metadata eXchange) facilitates interoperability between statistical systems; conversion to SDMX-conformant formats supports integration of research data with national statistical systems. For the full SDMX adoption pathway, see TG-4.6 Data Harmonisation §3.1.[47][48]
S-100 Universal Hydrographic Data Model (IHO) covers hydrographic and maritime data products including bathymetry (S-102), surface currents (S-111), and water level information (S-104). Where national hydrographic offices produce S-100 products, these should be considered a primary geospatial data source. For standards details, see TG-4.6 Data Harmonisation §3.1.[49]
3.4.3 Data provenance documentation
Provenance documentation tracks the history of a dataset through its processing chain, enabling users to understand how data have been transformed from raw observations to derived products[50]. For research data used in ocean accounting, provenance should document:
- Original data sources and collection methods
- Processing steps and algorithms applied
- Software and version numbers used
- Personnel responsible for processing
- Dates of processing steps
- Quality control procedures applied
The SEEA Technical Guidance on Biophysical Modelling recommends maintaining "a data provenance system" that "improves users' ability to understand the fitness for purpose of data sets"[51].
3.5 Compilation Procedure
This section outlines a systematic procedure for assessing, acquiring, and integrating research data into ocean accounting programmes. The procedure consists of four phases: research data assessment, metadata alignment, quality assurance, and account integration.
3.5.1 Phase 1: Research data assessment
The first phase involves identifying candidate research data sources and conducting a preliminary fitness assessment:
Step 1.1: Identify data requirements—Determine which components of the ocean accounts require research data inputs. This assessment should be guided by the account structure and the availability of alternative data sources. For example, if ecosystem condition accounts for pelagic waters are planned (see TG-6.5 Pelagic and Open Ocean Accounting), identify which condition variables (dissolved oxygen, chlorophyll-a, sea surface temperature) are available from research programmes versus traditional statistical sources.
Step 1.2: Survey available research data—Conduct a systematic survey of research data sources within the accounting domain. This survey should cover:
- National research institutions (marine laboratories, oceanographic institutes, fisheries research centres)
- International research programmes (GOOS, Argo, regional ocean observing systems)
- Global data repositories (OBIS, GBIF, Copernicus Marine Service)
- Published scientific literature and associated datasets
Step 1.3: Apply integration checklist—For each candidate data source, complete the integration checklist presented in Table 3.5.1. This checklist draws together the quality, metadata, and institutional considerations discussed in Sections 3.3 and 3.4. Each criterion is designated as either mandatory (must be met or formally waived before progression to Phase 2) or conditional (may remain open during Phase 2 or 3, subject to a documented resolution path).
Table 3.5.1: Research Data Integration Checklist
| Integration Criterion | Status | Assessment Questions | Documentation Required |
|---|---|---|---|
| Spatial coverage | Mandatory | Does it cover the accounting area? | Geographic metadata (bounding box, coordinate system) |
| Temporal alignment | Mandatory | Does it match accounting periods? | Date/time stamps, temporal resolution |
| Methodological consistency | Mandatory | Are methods comparable to official statistics? | Methods documentation, peer-reviewed publications |
| Institutional access | Mandatory | Can the NSO access/use the data? | Data sharing agreement, licensing terms |
| Quality assurance | Conditional | What QA procedures were applied? | Quality reports, validation studies |
| Classification alignment | Conditional | Are categories mappable to SEEA/ISIC? | Classification concordance or crosswalk |
| Metadata completeness | Conditional | Are ISO 19115/FAIR metadata available? | Metadata catalogue entry |
| Reproducibility | Conditional | Can results be reproduced from documented inputs? | Code repository, processing documentation |
Phase 1 gate. A data source may proceed to Phase 2 only if all mandatory criteria are met or formally waived. A waiver may be issued only by a named authority within the NSO (typically the head of the responsible statistical unit, or, for Mode C primary substitutions under Section 3.1.1, the chief statistician). The waiver, its scope, its expiry, and the remediation pathway must be recorded in the account compilation file. Conditional criteria that remain open at the gate must be assigned a target resolution date within Phase 2 or Phase 3; failure to resolve a conditional criterion within Phase 3 escalates it to a mandatory issue at the Phase 4 release decision.
3.5.2 Phase 2: Metadata alignment
Once candidate data sources have been identified, the second phase addresses metadata harmonisation:
Step 2.1: Extract existing metadata—Retrieve available metadata from the research data source. This may exist in ISO 19115 format (for geospatial data), Darwin Core format (for biodiversity observations), or NetCDF-CF format (for oceanographic model outputs).
Step 2.2: Assess metadata completeness against a two-tier standard—Compare existing metadata against the requirements for statistical use. NSOs should apply a two-tier metadata standard.
Statistical minimum (required to proceed to Phase 3 quality assurance). The dataset must have documented:
- Temporal extent (reference period and, where relevant, temporal resolution and update frequency)
- Spatial coverage (geographic extent and coordinate reference system)
- Measurement method (instrument, protocol, or modelling approach)
- Uncertainty estimate (quantitative where available; qualitative otherwise)
- Access terms (licence, citation requirements, restrictions)
Where formal metadata records (e.g., ISO 19115, Darwin Core) are absent—as is common for high-quality legacy datasets and many older oceanographic programmes—the statistical minimum may be satisfied using metadata extracted from a peer-reviewed publication that documents the dataset. The compiling NSO must record the source of each minimum element, including the publication citation and section reference where applicable.
Full standard (long-term target for repeated and high-profile sources). The dataset is described by a fully-populated ISO 19115 record (for geospatial data) or Darwin Core record (for biodiversity data), including all of the elements identified by the UN NQAF Level D essential metadata list: identification (title, abstract, keywords); temporal extent (reference period, temporal resolution, update frequency); spatial extent (geographic coverage, coordinate reference system); data quality (accuracy, completeness, consistency); lineage (data sources, processing steps); distribution (access constraints, usage licences); and contact information (data custodian, responsible party)[52].
A dataset meeting the statistical minimum may progress to Phase 3 while work to reach the full standard is undertaken in parallel.
Step 2.3: Fill metadata gaps—Where research data lack statistical metadata elements, work with the data provider to document missing information. Priority gaps include: correspondence to statistical classifications (e.g., mapping research ecosystem types to IUCN GET categories), uncertainty quantification (confidence intervals, accuracy assessments), and update schedules (will data be available on a recurring basis to support time series accounts?).
Step 2.4: Document provenance—Create or enhance provenance documentation following the framework in Section 3.4.3. For research data that have undergone multiple processing steps (e.g., satellite imagery processed to ocean colour products, acoustic survey data processed to biomass estimates), the provenance chain must be fully documented to support reproducibility.
3.5.3 Phase 3: Quality assurance
The third phase applies the quality assessment framework from Section 3.3:
Step 3.1: Assess relevance (as defined in Section 3.3.1)—Verify that the research data address the specific accounting requirements identified in Phase 1, considering both conceptual fit (do the measured variables correspond to the accounting concepts?) and practical fit (are the data sufficiently timely, granular, and complete?).
Step 3.2: Evaluate accuracy (as defined in Section 3.3.1)—Assess accuracy using available validation studies, inter-comparison exercises, or ground-truthing campaigns. For satellite-derived products, consult published accuracy assessments. For modelled outputs, assess model skill metrics against independent observations.
Step 3.3: Test coherence (as defined in Section 3.3.1)—Verify that research data can be combined with other account data sources. Coherence testing should identify discrepancies in spatial boundaries, temporal reference periods, or measurement units that require harmonisation (see TG-4.6 Data Harmonisation).
Where co-located independent monitoring is available (e.g., research station data co-located with an environmental protection agency monitoring station), coherence is tested by direct comparison of measurements at common locations and times. Where co-located data are unavailable—which is the typical situation for pelagic, open-ocean, and deep-shelf accounts—the following fallback coherence testing hierarchy applies. At least one fallback test must be completed and documented:
- Temporal climatology test. Compare research data values to climatological norms from global ocean databases (e.g., NOAA World Ocean Atlas, Copernicus Marine Service reanalyses) for the matching season, depth, and biogeographic region. Values falling outside the 5th-95th percentile range of the climatology should be reviewed for outlier status and either corrected, flagged, or accepted with documented justification.
- Cross-variable coherence test. Verify that correlated variables observed in the same dataset show expected relationships (e.g., dissolved oxygen and temperature follow expected solubility relationships; chlorophyll-a and nitrate show expected biogeochemical coupling). Departures from expected relationships indicate either real ecosystem signal or measurement issue and must be examined.
- Literature-based plausibility check. Compare summary statistics (mean, range, seasonal cycle) to published ranges for equivalent ecosystem types in the peer-reviewed literature. Values outside the published range should be reviewed for plausibility.
These fallback tests are not substitutes for direct co-location where it is feasible, but they ensure that the coherence check does not silently fail in the open-ocean and deep-water contexts where research data are most essential.
Step 3.4: Check comparability (as defined in Section 3.3.1)—Document any methodological changes that create breaks in time-series comparability, and verify consistent methods across geographic domains.
Step 3.5: Quantify uncertainty—Where feasible, quantify the uncertainty associated with research data values. This may take the form of standard errors (for survey-based estimates), confidence intervals (for modelled values), or qualitative uncertainty categories (high/medium/low confidence). Uncertainty estimates should be documented in metadata and, where appropriate, published alongside account values. Where multiple uncertainty components must be combined into a single published figure, follow the methods note in Section 3.3.3.
3.5.4 Phase 4: Account integration
The final phase integrates quality-assured research data into ocean accounts:
Step 4.1: Apply classification concordances—Where research data use different classifications from statistical standards, apply the concordances or crosswalks developed in Phase 2. For example, if research biodiversity data use scientific taxonomic names but the account structure requires aggregation to functional groups, apply the taxonomic-to-functional-group mapping.
Step 4.2: Reconcile spatial and temporal boundaries—Align research data to the spatial and temporal structure of the accounts. This may require spatial aggregation (from fine-resolution survey points to accounting spatial units), temporal aggregation (from monthly observations to annual accounting periods), or gap-filling (interpolating missing values).
Spatial interpolation method selection. The choice of spatial interpolation method must be justified by the density and spatial distribution of the source data, not selected by default. Table 3.5.4 specifies the recommended method as a function of station density and spatial pattern. The choice and its justification must be recorded in the account compilation file.
Table 3.5.4: Recommended spatial interpolation method by station density and distribution
| Station density / distribution | Recommended method | Notes |
|---|---|---|
| Dense (>30 stations over the accounting area) with approximately random or stratified spatial distribution | Ordinary kriging (or universal kriging where trend is present) | Variogram should be fitted and inspected; cross-validation mandatory |
| Moderate density (10-30 stations) with clustered or coastal-biased distribution | Inverse distance weighting (IDW) with documented search radius | Sensitive to clustering; report edge-effect zones as higher-uncertainty |
| Sparse coverage (<10 stations) or strongly heterogeneous environment | Thiessen (Voronoi) polygon assignment, or assignment by ecosystem stratum mean | Resulting account values must be flagged as Tier 1 (broad indicative) under SEEA EA tiering |
| Sparse and gridded reference data available | Regression-based methods (e.g., regression-kriging) using auxiliary covariates (bathymetry, SST) | Requires documented covariate justification |
Cross-validation is mandatory whichever interpolation method is selected. Leave-one-out cross-validation should produce a root-mean-square error no greater than the tolerance threshold, defined as the published measurement uncertainty of the source data multiplied by 1.5 (i.e., tolerance threshold = baseline RMSE × 1.5, a multiplicative allowance of 50% above the measurement uncertainty). Values exceeding this threshold indicate that interpolation uncertainty dominates and that the interpolated product must either be downgraded in tier, supplemented with additional stations, or replaced with a coarser spatial aggregation.
Step 4.3: Document data sources—Record the use of research data in the account compilation metadata. Documentation should identify: the research data source (with citation and persistent identifier), the account components that use the research data, the processing steps applied, and the quality assessment results. This documentation supports transparency and reproducibility.
Step 4.4: Establish update procedures—Where research data will be used on a recurring basis for time series accounts, establish procedures for data updates. Coordinate with research data providers to understand their publication schedule and arrange for regular data transfers. Monitor for methodological changes that may affect comparability across accounting periods.
Back-revision policy for retroactively corrected research data. Research datasets are frequently revised: stock assessments are updated as new cohorts enter the record, satellite data products undergo reprocessing, and bathymetric and hydrographic surveys are revised as better data become available. A retrospective revision to historical research data can propagate through multiple accounting periods of a time series account, creating apparent trends that are artefacts of data revision rather than real ecosystem change. NSOs must adopt a back-revision policy with the following elements:
- Monitoring. NSOs should subscribe to dataset versioning notices from research data providers and incorporate a versioning check into the standing data transfer protocol (Section 3.7.4).
- Materiality threshold. A retrospective revision triggers formal revision of previously published accounts where it produces a change of more than 5% in any key account indicator for any covered accounting period, or where it materially changes the direction of a published trend. Below the materiality threshold, the revision is recorded in the next account release without re-publication of historical accounts.
- Version identification. Every account release must identify the specific version (DOI or equivalent persistent identifier) of each research dataset used. This allows users to trace account values to specific dataset versions and supports reconstruction of historical account values.
- Metadata revision history. The account compilation metadata must include a revision history listing the date, dataset, version change, materiality assessment, and the action taken (re-published, recorded in next release, or no action).
This policy aligns with SEEA EA guidance on revision policy in statistical frameworks and with UN NQAF Level B revision policy requirements.
3.6 Worked Example: Integrating Oceanographic Survey Data into Condition Accounts
This worked example demonstrates the application of the compilation procedure to a realistic scenario: a National Statistical Office seeking to compile ecosystem condition accounts for coastal shelf waters using dissolved oxygen data from a national oceanographic research programme.
Setting: A coastal nation with 150,000 km² of exclusive economic zone (EEZ) shelf waters (depths <200m) seeks to compile annual condition accounts for the Marine Shelf (M1) ecosystem type following the IUCN Global Ecosystem Typology. One of the selected condition variables is dissolved oxygen concentration, which serves as an indicator of ecosystem health and hypoxia risk. The NSO has identified the National Oceanographic Research Institute (NORI) as a potential data provider.
Note on illustrative scope. This worked example uses a single condition variable (dissolved oxygen) and a linear-rescaling condition index purely for pedagogical clarity. SEEA EA Table 5.3 envisages multi-variable condition indices that combine abiotic chemical, abiotic physical, biotic compositional, structural, and functional characteristics, and a production-grade Marine Shelf (M1) condition account would aggregate dissolved oxygen with at least temperature, chlorophyll-a, and a biotic indicator. The dissolved oxygen reference (8.0 mg/L) and minimum-threshold (2.0 mg/L hypoxia) values used below are drawn from the OSPAR Ecological Quality Objectives for dissolved oxygen; national standards (e.g., EU Water Framework Directive good ecological status thresholds, regional sea convention values) may differ and should be used where authoritative locally.
Phase 1: Research data assessment
Step 1.1: Identify data requirements—The condition account requires dissolved oxygen measurements representative of the shelf ecosystem. Following SEEA EA guidance, the reference condition is defined as the dissolved oxygen level corresponding to a healthy, well-mixed shelf ecosystem (typically 6-8 mg/L). The account structure requires annual average values aggregated to ecosystem asset spatial units. Applying the Section 3.1.1 integration-mode typology, the NSO classifies this as Mode A (gap-filling): no official statistical source for subsurface dissolved oxygen exists in this jurisdiction.
Step 1.2: Survey available data—NORI conducts quarterly oceanographic surveys at 45 fixed stations distributed across the shelf. Each station is sampled at 5 depth intervals (surface, 25m, 50m, 75m, 100m). Dissolved oxygen is measured using calibrated Winkler titration (precision ±0.1 mg/L). The programme has operated continuously since 2010 with consistent methodology. Data are archived in NORI's institutional repository.
Step 1.3: Apply integration checklist—Applying Table 3.5.1 with mandatory/conditional designations:
| Criterion | Status | Assessment Result | Documentation |
|---|---|---|---|
| Spatial coverage | Mandatory | 45 stations cover 150,000 km² shelf area; spatial interpolation required | Station coordinates in WGS84 |
| Temporal alignment | Mandatory | Quarterly surveys provide seasonal coverage; annual averaging feasible | Survey dates documented per cruise |
| Methodological consistency | Mandatory | Winkler titration is standard oceanographic method | NORI Standard Operating Procedures manual |
| Institutional access | Mandatory | NORI willing to share data; MoU required | Draft MoU provided by NORI legal office |
| Quality assurance | Conditional | Inter-laboratory bias <0.15 mg/L vs. IOC/GOOS ±0.2 mg/L standard | QA reports 2012-2020 |
| Classification alignment | Conditional | Dissolved oxygen is a standard SEEA EA condition variable | SEEA EA Table 5.3 |
| Metadata completeness | Conditional | Station metadata exist; cruise-level metadata incomplete | ISO 19115 records for stations |
| Reproducibility | Conditional | Raw titration data archived; processing code not version-controlled | NORI agrees to deposit in GitHub |
Phase 1 gate finding: All four mandatory criteria are met (subject to MoU finalisation, which the NSO head of environmental statistics waives for a 90-day remediation window). Conditional criteria are open but on a documented resolution path. The source proceeds to Phase 2.
Phase 2: Metadata alignment
Step 2.1: Extract existing metadata—NORI provides station-level metadata in CSV format including: station ID, latitude, longitude, depth, seafloor substrate type, and sampling history. Dissolved oxygen data are provided in a separate CSV with fields: station ID, cruise ID, date, depth, dissolved oxygen (mg/L), temperature (°C), salinity (PSU).
Step 2.2: Assess metadata completeness against the two-tier standard (Section 3.5.2)—Existing metadata are evaluated against the statistical minimum: temporal extent (partially documented; gaps in cruise-level dates), spatial coverage and CRS (CRS implicit, must be made explicit), measurement method (documented), uncertainty estimate (available from QA reports), access terms (under MoU). The dataset meets the statistical minimum once CRS is explicitly documented and cruise dates are filled in. The full ISO 19115 standard is set as a 12-month target.
Step 2.3: Fill metadata gaps—NSO and NORI jointly develop enhanced metadata including:
- Temporal coverage: Date range for each quarterly cruise added to cruise metadata table
- Spatial reference: Coordinate reference system explicitly documented as WGS84 (EPSG:4326)
- Quality flags: NORI applies automated QC checks (range test, climatology test, spike test) following GOOS recommendations and adds QC flags to data file (1=good, 2=probably good, 3=probably bad, 4=bad)
- Provenance: Processing workflow documented: raw titration volume → dissolved oxygen calculation using modified Winkler equation → temperature and salinity correction → final value in mg/L
Step 2.4: Document provenance—Provenance record created:
Dissolved oxygen values are measured using Winkler titration following the GOOS BioEco Panel recommendations. Seawater samples are collected using Niskin bottles mounted on a CTD rosette. Titration is performed shipboard within 6 hours of collection. Raw titration volumes are converted to dissolved oxygen concentration using the modified Winkler equation with temperature and salinity corrections applied. Processing code (Python) is version-controlled at https://github.com/NORI/oceanography/DO-processing [fictional URL] (Reproducibility Level 2 per Section 3.3.2). Quality control follows GOOS Real-Time Quality Control procedures (GOOS, 2021).
Phase 3: Quality assurance
Step 3.1: Assess relevance—The dissolved oxygen data directly address the condition account requirement for chemical state characteristics. Quarterly temporal resolution provides adequate seasonal coverage for annual aggregation. Spatial coverage (45 stations across 150,000 km²) is sparser than ideal but sufficient for broad-scale condition assessment given the relatively homogeneous shelf environment.
Step 3.2: Evaluate accuracy—NORI's QA reports show inter-laboratory comparison results within ±0.15 mg/L of reference standards, satisfying the IOC/GOOS ±0.2 mg/L accuracy threshold referenced in Table 3.3.1. Sampling precision (replicate measurements at same station) averages ±0.08 mg/L. The accuracy dimension is rated "Acceptable without qualification".
Step 3.3: Test coherence—NSO compares dissolved oxygen data against coastal water quality monitoring data from the environmental protection agency at 12 co-located stations. Average difference is 0.12 mg/L (within measurement uncertainty), confirming coherence. For shelf-edge stations where no co-located independent monitoring exists, the NSO additionally applies the temporal climatology test against NOAA World Ocean Atlas seasonal climatology: all 45 stations' annual means fall within the 5th-95th percentile range of climatological values for the matching region and depth, confirming the absence of systematic bias.
Step 3.4: Check comparability—NORI methodology has remained unchanged since programme inception (2010). All data are directly comparable over time. Spatial comparability verified by consistent station locations and sampling protocols.
Step 3.5: Quantify uncertainty—Based on QA assessment, dissolved oxygen values carry measurement uncertainty of ±0.15 mg/L (combining measurement precision and inter-laboratory comparison). This is combined in quadrature with spatial interpolation uncertainty (Step 4.2 cross-validation) per the JCGM 100:2008 (GUM) procedure described in Section 3.3.3. Combined uncertainty $u_c = \sqrt{(0.15)^2 + u_{spatial}^2}$ varies by asset unit depending on distance to the nearest station.
Phase 4: Account integration
Step 4.1: Apply classification concordances—No classification mapping required; dissolved oxygen is used directly.
Step 4.2: Reconcile spatial and temporal boundaries—The accounting area is divided into 500 ecosystem asset spatial units (300 km² each) based on seabed substrate type. Applying the Table 3.5.4 method-selection rubric: with 45 stations over 150,000 km² in an approximately stratified spatial distribution, the rubric points to ordinary kriging. However, because variogram fitting at this station density is unstable, the NSO uses inverse distance weighting with a 30 km search radius as a pragmatic alternative and documents the reasoning. Mandatory leave-one-out cross-validation yields RMSE = 0.22 mg/L, within the 50% tolerance over the measurement uncertainty (0.15 × 1.5 = 0.225 mg/L), so the interpolated product is accepted. Quarterly values are averaged to produce annual mean dissolved oxygen per asset unit.
Step 4.3: Document data sources—Account compilation metadata records:
Dissolved oxygen condition data sourced from National Oceanographic Research Institute Quarterly Shelf Survey (NORI-QSS), 2015-2020, dataset version 2021.1 (DOI: 10.xxxx/nori-qss-2021.1) [fictional]. Data access via MoU between NSO and NORI dated 2021-03-15. Data processing conducted by NSO Environmental Accounts Unit using scripts deposited at https://github.com/NSO/ocean-accounts/condition-processing [fictional URL]. Integration mode: A (gap-filling). Spatial interpolation: inverse distance weighting, 30 km search radius (justified per Table 3.5.4; cross-validation RMSE 0.22 mg/L). Temporal aggregation: arithmetic mean of quarterly values. Uncertainty combined in quadrature per JCGM 100:2008: ±0.15 mg/L measurement plus variable spatial interpolation uncertainty.
Step 4.4: Establish update procedures—NSO and NORI agree that NORI will provide annual data extracts by 31 March each year (covering the previous calendar year). NSO will re-run spatial interpolation and update condition accounts by 30 June. NORI will notify NSO of any methodological changes at least 6 months prior to implementation. Back-revision policy applied: NORI dataset versioning notices are subscribed to; any retrospective revision producing >5% change in the asset-unit condition index for any previously published year triggers re-publication of affected accounts. Each account release identifies the NORI dataset DOI.
Resulting condition account entry (example for one asset unit):
| Accounting year | Dissolved oxygen (mg/L) | Indicator value | Uncertainty |
|---|---|---|---|
| 2015 | 6.8 | 0.80 (good condition) | ±0.35 mg/L |
| 2016 | 6.5 | 0.75 (good condition) | ±0.33 mg/L |
| 2017 | 5.9 | 0.65 (moderate condition) | ±0.38 mg/L |
| 2018 | 6.2 | 0.70 (moderate condition) | ±0.36 mg/L |
| 2019 | 5.7 | 0.62 (moderate condition) | ±0.40 mg/L |
| 2020 | 5.4 | 0.57 (moderate condition) | ±0.42 mg/L |
Note: Condition is classified as "good" where the indicator value is 0.75 or above, and "moderate" where it falls below 0.75. Uncertainty values vary across years because some monitoring stations had intermittent data gaps in later years, increasing spatial interpolation distance and the combined-in-quadrature uncertainty for this asset unit.
Outcome: The NSO has successfully integrated research data from NORI into the ecosystem condition account. The dissolved oxygen time series reveals a declining trend from good to moderate condition over 2015-2020, prompting policy attention to potential eutrophication drivers. The documented uncertainty estimates and provenance information support transparent communication of account results and reproducible updates in future accounting periods.
3.7 Institutional Arrangements
Effective integration of research data into official statistics requires institutional arrangements that bridge the different cultures, incentives, and practices of statistical offices and research organisations.
3.7.1 Roles of National Statistical Offices
The SEEA Technical Recommendations identify several roles that NSOs can play in ecosystem accounting that are relevant to research data integration. Table 3.7.1 below summarises these roles.[53]
| Role | Description |
|---|---|
| Data organisation | NSOs have expertise in collecting and organising data from diverse sources, building coherent pictures from varied inputs. |
| Standards stewardship | NSOs establish and maintain definitions, concepts, and classifications, addressing the multiple definitions common in research contexts. |
| Data integration | NSOs integrate data from various sources within national and international statistical frameworks. |
| Quality frameworks | NSOs apply data quality frameworks enabling consistent assessment and accreditation of information sources. |
| National coverage | NSOs create national pictures, applying techniques for scaling information to national level. |
| Authority | NSOs present an authoritative voice through application of standard measurement approaches and quality frameworks. |
While NSOs may not have deep expertise in marine science, they bring essential capabilities for transforming research data into official statistics[54].
3.7.2 Roles of research institutions
Research institutions contribute domain expertise, data collection infrastructure, and methodological innovation. The SEEA Technical Recommendations note that "agencies that lead work on geographic and spatial data—particularly the mapping of environmental data and the use of remote sensing information—including for spatial and temporal modelling of ecosystem services" play important roles[55]. For ocean accounting, relevant research institutions include:
- Universities with marine science programmes
- National oceanographic and hydrographic agencies
- Fisheries research institutes
- Environmental monitoring agencies
- International research programmes (e.g., IOC, ICES)
These institutions are often the primary custodians of ocean observation data, biodiversity records, and ecosystem assessments needed for ocean accounts.
3.7.3 Establishing partnerships
The Global Statistical Geospatial Framework (GSGF) provides general guidance on collaboration between statistical offices and geospatial agencies[56], but agreements with research institutions need to address dimensions that NSO-geospatial agency templates (such as the UN-GGIM MoU template) do not cover. Research institutions operating under university or grant-funded regimes have data governance requirements—publication embargo periods, intellectual property ownership, attribution under Creative Commons or funder mandates—that differ materially from those of national mapping agencies. NSOs negotiating data sharing arrangements with research partners should therefore use a purpose-built research data sharing agreement rather than carrying over the UN-GGIM template unchanged. The OECD Principles and Guidelines for Access to Research Data from Public Funding (2007) provide the governing international framework.
Research data sharing agreement checklist. A research data sharing agreement should address the following at minimum:
- Intellectual property ownership. Identify the rights holders of the data (institution, individual researcher, funder, or joint), confirm that the NSO has the rights necessary to ingest, transform, and republish the data in statistical products, and record any conditions imposed by funders (e.g., European Research Council, NIH, Wellcome, NSF).
- Publication embargo periods. Where the research team needs to publish primary findings before the data are released, specify the embargo duration. The maximum acceptable embargo for ocean-accounting purposes is 6 months from the end of the data collection period, consistent with the annual account publication cycle: data collected by 31 December must be available to the NSO by 30 June of the following year to meet standard account publication deadlines (see Step 4.4). Shorter embargoes should be sought wherever possible. Embargoes beyond 6 months are not compatible with the annual publication cycle of official accounts and should not be accepted; where a research partner requires a longer embargo as a condition of data sharing, this should be escalated to the chief statistician as an exceptional governance case. Note that many major international funder mandates (e.g., Horizon Europe, NSF) require data availability within 12 months of collection—this funder deadline does not override the NSO's 6-month operational requirement.
- Attribution requirements. Specify the citation form required by the data provider (data DOI plus accompanying publication) and confirm that the NSO will discharge these in account release documentation.
- Data lifecycle. Address versioning (how new versions will be announced and made available), deprecation (how withdrawn versions will be handled), and the back-revision interaction (Section 3.5.4 Step 4.4).
- Confidentiality of pre-publication data. Where the NSO is provided access to pre-publication data, specify the handling regime (named individuals, secured storage, no onward sharing) and the point at which the data may be released into the NSO's normal compilation environment.
Communities of practice. Regular engagement through working groups or committees maintains relationships and addresses emerging issues. The SEEA Technical Recommendations emphasise that "appropriate institutional arrangements and resourcing to support ongoing engagement and communication are also required"[57].
Capacity building. Joint training and skill-sharing activities build mutual understanding between statistical and research communities. Research personnel may need orientation on statistical concepts and quality frameworks; statistical personnel may need training on oceanographic data and methods.
Several countries provide practical models for NSO-research institution partnerships in ocean accounting. In Australia, the ABS compiles environmental-economic accounts in partnership with DCCEEW, drawing on scientific data inputs from CSIRO and IMOS.[58] In the Netherlands, Statistics Netherlands (CBS) has worked with Wageningen University and NIOZ to compile experimental natural capital accounts for the North Sea using research-derived biophysical models and real-time sensor networks.[58:1] Successful partnerships require sustained engagement over multiple accounting cycles, clear allocation of responsibilities, and mutual recognition of complementary capabilities.
3.7.4 Data transfer protocols
The SEEA Technical Note on Air Emission Accounts recommends establishing "data transfer protocols" given that "data may be acquired from a number of institutions or agencies"[59]. Such protocols should address:
- Data formats and transmission methods
- Timing and frequency of data provision
- Procedures for handling system changes and upgrades
- Metadata to be provided with each data transfer
- Dataset versioning notices (to support the Section 3.5.4 back-revision policy)
- Feedback mechanisms for data quality issues
Establishing robust protocols prevents disruption to statistical production when research systems are upgraded or personnel change.
3.7.5 Guidance for low-capacity and SIDS contexts
The institutional arrangements set out in Sections 3.7.1-3.7.4 assume that one or more national research institutions exist with which the NSO can establish a formal partnership. For many small island developing states (SIDS), least developed countries, and other contexts where national research capacity is still developing, ocean accounts may need to be compiled primarily using data sourced from international repositories and global research programmes. Ocean accounts in these contexts must be compiled using data sourced entirely from international repositories and global research programmes. This is the dominant scenario for a substantial part of the GOAP target audience; the absence of a domestic research institution partner should not be treated as a barrier to compiling ocean accounts.
The following adapted procedure applies where no national research institution partner exists:
- Use pre-validated global data products as research data inputs without a national MoU. Acceptable sources include the Copernicus Marine Service (validated ocean colour, sea surface temperature, sea level, and biogeochemistry products), GOOS Argo (subsurface temperature, salinity, dissolved oxygen), and OBIS / GBIF (biodiversity occurrence). These products are issued under open licences that permit statistical use without bespoke agreements. The product DOI or persistent identifier serves the role that the MoU would otherwise play in establishing access provenance.
- Base quality assurance on the product's published quality reports. In place of a bespoke Section 3.5.3 quality assessment, the NSO may rely on the product quality factsheet, validation report, or technical note published by the data provider, provided this is referenced in the account compilation metadata together with the product version. The Section 3.5.3 Step 3.3 fallback coherence tests still apply.
- Engage regional bodies as institutional partners in lieu of national research institutes. Examples include the Secretariat of the Pacific Regional Environment Programme (SPREP) and the Pacific Community (SPC) for the Pacific Islands, the Indian Ocean Global Ocean Observing System (IOGOOS), the Caribbean Community Climate Change Centre (5Cs), and the Intergovernmental Oceanographic Commission Sub-Commission for Africa and the Adjacent Island States (IOCAFRICA). Regional bodies can provide brokered access to data, technical support, and a community of practice that substitutes for a national institutional partner.
- Simplified metadata and provenance documentation. Pre-validated global products carry mature metadata; the NSO's documentation obligation is reduced to citing the product, version, access date, and the elements of the product's quality report relied upon in the account.
NSOs operating in low-capacity contexts should not be deterred by the formal arrangements set out earlier in Section 3.7: those arrangements describe the upper end of the institutional spectrum, while this sub-section describes the lower end at which ocean accounting is equally feasible.
4. Acknowledgements
This Circular has been approved for public circulation and comment by the GOAP Technical Experts Group in accordance with the Circular Publication Procedure.
Authors: [To be confirmed]
Reviewers: [To be confirmed]
5. References
SEEA EA, para. 5.14-5.30. Ecosystem condition accounts record "the quality of an ecosystem" through biophysical and chemical characteristics. ↩︎
Intergovernmental Oceanographic Commission. (2019). The Global Ocean Observing System 2030 Strategy. Paris: UNESCO-IOC. Available from: https://www.goosocean.org/ ↩︎
Simmonds, J., & MacLennan, D.N. (2005). Fisheries Acoustics: Theory and Practice, 2nd ed. Oxford: Blackwell Science. Acoustic surveys provide fishery-independent biomass estimates used in stock assessment. ↩︎
SEEA EA, Table 5.3. The ecosystem condition typology identifies abiotic (physical state, chemical state), biotic (compositional, structural, functional), and landscape characteristics. ↩︎
Hilborn, R., & Walters, C.J. (1992). Quantitative Fisheries Stock Assessment: Choice, Dynamics and Uncertainty. New York: Chapman and Hall. Stock assessment integrates fishery-dependent catch data with fishery-independent survey data. ↩︎
Bravington, M.V., Skaug, H.J., and Anderson, E.C. (2016). "Close-kin mark-recapture." Statistical Science, 31(2), 259-274. https://doi.org/10.1214/16-STS552. Applied to southern bluefin tuna by Hillary, R.M. et al. (2018), Scientific Reports, 8, 13767. ↩︎
Data-poor stock assessment methods referenced in Section 3.1.2: Hordyk, A., Ono, K., Valencia, S., Loneragan, N., & Prince, J. (2015). A novel length-based empirical estimation method of spawning potential ratio (SPR), and tests of its performance, for small-scale, data-poor fisheries. ICES Journal of Marine Science, 72(1), 217--228. doi:10.1093/icesjms/fsu004; Froese, R., Demirel, N., Coro, G., Kleisner, K. M., & Winker, H. (2017). Estimating fisheries reference points from catch and resilience. Fish and Fisheries, 18(3), 506--526. doi:10.1111/faf.12190; Dick, E. J., & MacCall, A. D. (2011). Depletion-Based Stock Reduction Analysis: a catch-based method for determining sustainable yields for data-poor fish stocks. Fisheries Research, 110(2), 331--341. doi:10.1016/j.fishres.2011.05.007. For a contemporary FAO synthesis of data-poor methods, see: Punt, A. E., Butterworth, D. S., de Moor, C. L., De Oliveira, J. A. A., & Haddon, M. (2014). Stock Assessment Methods Used by National and Regional Fisheries Management Organizations (FAO Fisheries and Aquaculture Technical Paper No. 569). FAO. fao.org/3/i3953e. SEEA EA para. 12.15 supports tiered reporting of asset accounts. ↩︎
OBIS. (2025). Ocean Biodiversity Information System. Available from: https://obis.org/—GBIF. (2025). Global Biodiversity Information Facility. Available from: https://www.gbif.org/ ↩︎
FDES 2013, para. 1.32-1.33. Scientific research and special projects "can be used to address data gaps" but "often use terms and definitions that differ from those used in statistics." ↩︎
Intergovernmental Oceanographic Commission. (2019). The Global Ocean Observing System 2030 Strategy. Paris: UNESCO-IOC. ↩︎
Roemmich, D., et al. (2019). "On the future of Argo: A global, full-depth, multi-disciplinary array." Frontiers in Marine Science, 6, 439. https://doi.org/10.3389/fmars.2019.00439 ↩︎
SDG Framework. SDG Target 14.a: "Increase scientific knowledge, develop research capacity and transfer marine technology, taking into account the Intergovernmental Oceanographic Commission Criteria and Guidelines on the Transfer of Marine Technology." ↩︎
GOOS. (2023). GOOS Essential Ocean Variables. Available from: https://www.goosocean.org/eov—Observation coverage is denser in developed country waters and coastal zones. ↩︎
GOOS. (2023). GOOS Essential Ocean Variables. Available from: https://www.goosocean.org/eov ↩︎
IUCN. (2020). IUCN Global Ecosystem Typology 2.0: Descriptive profiles for biomes and ecosystem functional groups. Gland: IUCN. https://doi.org/10.2305/IUCN.CH.2020.13.en ↩︎
Thomsen, P.F., & Willerslev, E. (2015). "Environmental DNA - An emerging tool in conservation for monitoring past and present biodiversity." Biological Conservation, 183, 4-18. ↩︎
GBIF. (2025). Global Biodiversity Information Facility. Available from: https://www.gbif.org/ ↩︎
OBIS. (2025). Ocean Biodiversity Information System. Available from: https://obis.org/ ↩︎
SEEA Technical Recommendations, para. 1.34. Fully spatial approaches "will generally be more resource intensive and implementation will require more ecological and geo-spatial expertise." ↩︎
GCRMN. (2020). Status of Coral Reefs of the World: 2020. Available from: https://gcrmn.net/ ↩︎
SEEA EA, para. 12.15 on tiered approaches to measurement. ↩︎
SEEA Biophysical Modelling, para. 370. "Remote sensing data and modelling approaches provides enormous opportunities to disseminate data with very short time-lags and high-frequency." ↩︎
Copernicus Marine Service. (2025). Available from: https://marine.copernicus.eu/ ↩︎
SEEA Biophysical Modelling, para. 370. ↩︎
SEEA Biophysical Modelling, para. 370. Remote sensing provides "very short time-lags and high-frequency" data but requires validation against in situ measurements. ↩︎
SEEA Biophysical Modelling, para. 363. "The accuracy of modelled data can be assessed, although different approaches may be needed depending on the type of model used." ↩︎
FDES 2013, para. 1.32. Scientific research data "are usually available at no or low cost." ↩︎
FDES 2013, para. 1.33. Research data "often use terms and definitions that differ from those used in statistics", have "limited scope", and are "often available on a one-time basis only." ↩︎
United Nations. (2019). United Nations National Quality Assurance Frameworks Manual for Official Statistics. New York: United Nations Statistics Division. Available from: https://unstats.un.org/unsd/methodology/dataquality/unnqaf-manual/ ↩︎
UN NQAF Manual, para. on relevance (Level D quality dimensions). Relevance is context-dependent and requires user consultation. ↩︎
SEEA Biophysical Modelling, paras. 361-362 on accuracy of input data. ↩︎
SEEA Biophysical Modelling, paras. 363-369 on model validation approaches. ↩︎
GSGF v2, para. on Principle 4. "Interoperability between statistical and geospatial data and metadata standards is needed to overcome structural, semantic, and syntactic barriers." ↩︎
SEEA Biophysical Modelling, para. 373. "Modelling approaches have been rapidly improving...This creates challenges in including data produced from biophysical models into accounts." ↩︎
UN NQAF Manual, para. on accessibility (Level D quality dimensions). Accessibility covers both discoverability and ease of access. ↩︎
National Academies of Sciences, Engineering, and Medicine. (2019). Reproducibility and Replicability in Science. Washington, DC: The National Academies Press. https://doi.org/10.17226/25303 ↩︎
National Academies of Sciences, Engineering, and Medicine. (2019). Reproducibility and Replicability in Science. Washington, DC: The National Academies Press. Ch. 3 distinguishes reproducibility (same data, same methods) from replicability (new data, same or similar methods). ↩︎
SEEA Biophysical Modelling, para. 378. Transparency of workflow and uncertainty quantification are essential for reproducibility. ↩︎
SEEA Biophysical Modelling, para. 360. Uncertainty matrices outline possible sources of uncertainty for each model. ↩︎
SEEA Biophysical Modelling, para. 369. Model outputs should be seen as best estimates unless detailed validation has been conducted. ↩︎
SEEA Biophysical Modelling, para. 379. "Tier 1 and Tier 2 approaches may be best for awareness raising or analysis of broad spatiotemporal trends." ↩︎
UN NQAF Manual, recommendation on quality reporting. Statistical agencies should publish quality information including accuracy measures. ↩︎
Wilkinson et al. (2016). "The FAIR Guiding Principles." ↩︎
ISO 19115-1:2014. Geographic information—Metadata—Part 1: Fundamentals. International Organization for Standardization. ↩︎
Darwin Core Maintenance Group. (2021). Darwin Core Quick Reference Guide. Available from: https://dwc.tdwg.org/terms/ ↩︎
CF Conventions Committee. (2023). CF Conventions and Metadata. Available from: https://cfconventions.org/ ↩︎
SDMX. (2021). SDMX 3.0 Technical Standards. Available from: https://sdmx.org/ ↩︎
SDMX. (2021). SDMX 3.0 Technical Standards. "The Statistical Data and Metadata Exchange (SDMX) initiative sets standards that can facilitate the exchange of statistical data and metadata using modern information technology." ↩︎
International Hydrographic Organization. (2022). S-100 Universal Hydrographic Data Model. Edition 5.1.0. Monaco: IHO. Available from: https://iho.int/en/s-100-universal-hydrographic-data-model ↩︎
SEEA Biophysical Modelling, para. 374 on data provenance systems. ↩︎
SEEA Biophysical Modelling, para. 374. Data provenance systems "improve users' ability to understand the fitness for purpose of data sets." ↩︎
UN NQAF Manual, Level D (Managing statistical outputs). Essential metadata elements for statistical dissemination. ↩︎
SEEA Technical Recommendations, Box 1.2. Potential roles of National Statistical Offices in ecosystem accounting. ↩︎
SEEA Technical Recommendations, paras. 1.60-1.61 on roles of NSOs and non-NSO agencies. ↩︎
SEEA Technical Recommendations, para. 1.61. Agencies with geospatial and remote sensing expertise play important roles. ↩︎
GSGF v2, Section on Principle 1. "Establishing strong communication and institutional collaboration mechanisms between NSOs and NGIAs is essential. This can be facilitated by, for example, country-level laws and policies, Memorandum of Understandings (MoUs), data sharing agreements, and other communities of practice." See also OECD (2007), Principles and Guidelines for Access to Research Data from Public Funding, Paris: OECD, as the governing framework for research-data sharing agreements. ↩︎
SEEA Technical Recommendations, para. 1.57. "Given the need for involving many areas of expertise, an important aspect of implementation is the allocation of resources to co-ordination, data sharing and communication." ↩︎
SEEA Technical Recommendations, paras. 1.55-1.62 on institutional arrangements for ecosystem accounting, including examples of multi-agency collaboration models. ↩︎ ↩︎
SEEA Technical Note: Air Emission Accounts, para. 81. "Given that data may be acquired from a number of institutions or agencies, it is important to establish data transfer protocols." ↩︎