How do you normalize wearable data across providers?

Normalization means mapping every provider's data into one canonical schema: convert units to a common base (seconds, meters, °C), rename fields to a shared vocabulary, and align metric semantics so the same field means the same thing across sources. The reliable approach is to define a canonical model per metric, write a per-provider mapping layer that converts units and field names, preserve provenance (which source and which algorithm produced each value), and align semantics — not just syntax — before you do anything else with the data. Normalize first, then deduplicate; you can't reconcile records you haven't aligned.

Why do Garmin, Oura, and Fitbit return data in different formats?

Because every wearable vendor designed its API independently, with its own units, field names, data shapes, authentication, and update frequencies. Garmin returns sleep as total_sleep_seconds, Oura as a sleep_duration in minutes, and Apple HealthKit as HKCategoryValueSleepAnalysis category samples — three formats for the same fact. Some providers push daily summaries; others expose session-level detail. There is no shared standard, so aligning them is the developer's job.

Why isn't an Apple Watch HRV comparable to an Oura HRV?

Because they aren't the same metric. Apple Watch reports HRV as SDNN, while Oura, WHOOP, and most others report RMSSD — two different calculations over different time windows. An Apple Watch HRV of 35 ms and an Oura HRV of 35 ms are not the same physiological quantity. Even among RMSSD devices the window differs: WHOOP computes HRV during deepest sleep, Oura averages 5-minute samples across the whole night. Normalizing wearable data means flagging metrics like this as non-comparable rather than blending them into one number.

Should you normalize or deduplicate wearable data first?

Normalize first. Deduplication reconciles overlapping records of the same event into one truthful value, but you can't compare or reconcile records until they share units, field names, and semantics. Normalizing into a canonical schema is the prerequisite; deduplication runs on top of the aligned data. Doing it the other way around is a common and painful mistake.

How to Normalize Wearable Data Across Providers: A Developer's Guide

A user connects an Oura Ring and an Apple Watch, and your app pulls HRV from both. Oura says 42. Apple says 38. A reasonable engineer averages them to 40 and moves on — and ships a bug. Those two numbers aren’t measuring the same thing: Apple reports SDNN, Oura reports RMSSD, computed over different time windows. Averaging them is like averaging a temperature in Celsius with one in Fahrenheit [1].

This is the normalization problem, and it’s distinct from — and upstream of — deduplication. Deduplication asks “is this the same event counted twice?” Normalization asks the more basic question: “do these two records even mean the same thing?” Until the answer is yes, every downstream score, trend, and comparison is built on sand.

This guide covers what normalization actually involves across wearable providers, the four layers where data diverges, and a strategy that holds up as you add sources.

Normalization vs deduplication

The two get conflated, but they solve different problems and run in a specific order:

Normalization aligns representation — converting Garmin’s total_sleep_seconds, Oura’s sleep_duration (minutes), and Apple’s HKCategoryValueSleepAnalysis samples into one canonical field with one unit and one meaning [2].
Deduplication aligns records — reconciling the same night of sleep, recorded by two devices, into a single truthful value.

You normalize first. Deduplication compares records to decide which to keep, and you can’t compare records that don’t share units, field names, and semantics. Normalize, then dedup — the reverse is one of the most common and most painful integration mistakes.

The four layers of mismatch

Wearable data diverges at four levels, from the superficial to the subtle. Easy ones are mechanical; the dangerous ones are semantic, because they look aligned when they aren’t.

1. Units and field names (syntactic)

The most visible layer, and the easiest to fix. The same fact arrives with different names and units:

Metric	Garmin	Oura	Apple HealthKit
Sleep duration	`total_sleep_seconds` (s)	`sleep_duration` (min)	`HKCategoryValueSleepAnalysis` (enum samples)
Distance	meters	meters	meters or km by locale
Temperature	°C deviation	°C deviation	°C
Energy	kilocalories	—	kcal or kJ

A normalization layer converts everything to one base unit per metric (seconds, meters, °C, kcal) and one field vocabulary. Mechanical, but unforgiving: a seconds-vs-minutes slip silently inflates sleep 60×.

2. Metric semantics (the dangerous layer)

Here the field name matches but the definition doesn’t. Two providers both call a field “HRV,” and the numbers are not comparable [1][3]:

Apple Watch reports SDNN; Oura, WHOOP, Fitbit, Samsung report RMSSD. Different formulas over different beat-to-beat windows.
WHOOP computes HRV during your deepest sleep; Oura averages 5-minute samples across the whole night; Apple samples opportunistically, including during Mindfulness sessions [1].

The same is true of sleep stages: what one device labels “light” sleep overlaps differently with another’s “core,” and stage definitions vary by vendor — which is exactly why the same night looks different on an Apple Watch and an Oura Ring. Normalizing the number without normalizing the meaning produces values that line up in a database and lie to your users.

The HRV trap: An Apple Watch HRV of 35 ms and an Oura HRV of 35 ms are not the same physiological quantity — one is SDNN, the other RMSSD. Don’t store them in the same column as if they’re interchangeable, and never average across them. Either keep the metric type alongside the value, or pick one source per user and don’t mix.

3. Data shape and granularity

Providers model time differently. Oura prioritizes daily summaries; Polar and others expose session-level detail; raw streams arrive at different sampling frequencies [2]. Some providers push via webhooks; others require polling. Normalizing means deciding on a canonical granularity (e.g., per-day and per-session records) and reshaping each source into it — aggregating where a source is too granular, and accepting coarser resolution where it only offers summaries.

4. Time, timezone, and identity

The quietest layer, and a frequent source of production incidents. Timestamps arrive in epoch seconds, ISO-8601, or local time; a record’s “day” depends on the user’s timezone at the moment of capture; and each provider has its own user, device, and source identifiers. Normalize timestamps to a single convention (UTC plus an explicit offset), and carry a consistent identity and provenance for every record — you’ll need both for deduplication later.

A normalization strategy that holds up

The same framework works whether you support two providers or twenty.

1. Define a canonical schema

Pick one target model and make every source conform to it: one unit, one field name, and one type per metric. You can adopt an existing standard — Open mHealth schemas are designed for mobile and wearable data, while FHIR is clinically oriented and mapping consumer fitness metrics onto it adds complexity for little benefit [4][5]. Or define your own internal model. What matters is that it’s singular and explicit.

2. Write a per-provider mapping layer

For each provider, a thin adapter converts units, renames fields, and maps enumerations (sleep-stage taxonomies, activity types) into the canonical vocabulary. Isolate this per source so that when a vendor changes its API — and they do — the blast radius is one adapter, not your whole pipeline.

3. Normalize semantics, not just syntax

This is the step that separates a real normalization layer from a rename script. Where a metric isn’t comparable across sources (HRV’s SDNN vs RMSSD, divergent sleep-stage definitions), don’t force it into one number. Tag each value with its source and method, and either expose the metric type or commit to a single source per user per metric. Aligning the schema is necessary; aligning the meaning is what makes the data trustworthy.

4. Preserve provenance and method

Never discard which source and which algorithm produced a value. You need it to flag non-comparable metrics, to weight sources, and — critically — to deduplicate afterward. Normalization that throws away provenance to produce a “clean” value destroys the information the next stage depends on.

5. Normalize before you deduplicate

With every source mapped into the canonical schema, deduplication and reconciliation finally have aligned records to work on. Run them on the normalized set, not the raw one.

The edge cases that will bite you

Enum drift. A vendor renames or adds a sleep stage in a minor API update, and your mapping silently drops the new value into a default bucket. Validate against an allowlist and alert on unknowns.
Unit ambiguity by locale. Distance and energy can switch units based on account or device locale. Don’t assume a unit — read it from the payload or pin it per provider.
Missing vs zero. A provider that omits a field is not reporting zero. Collapsing “no data” into 0 corrupts averages and trends. Preserve null.
Timezone-dependent days. Aggregate in the wrong zone and one night’s sleep splits across two calendar days, manufacturing both a gap and a duplicate.
Schema version skew. Two users on different app or firmware versions can return different shapes from the “same” provider. Version your adapters.

What this means for builders

Normalization is the layer everything else stands on. A health score, a readiness trend, a cohort comparison — each silently assumes the inputs share units and meaning. When they don’t, the output isn’t noisy, it’s wrong in a way that looks right, which is the worst failure mode in a data product.

The honest assessment of the work: a two-provider integration is a weekend; the real cost is the long tail. Every new source is another adapter, every vendor API change is maintenance, and the semantic cases — HRV metric types, sleep-stage taxonomies, granularity mismatches — never fully stop demanding judgment. It’s also entirely undifferentiated: no user has ever chosen a product because its unit conversion was tidy. (For the market context behind this fragmentation, see the state of wearable health data.)

That combination — foundational, unglamorous, and ever-growing — is why normalization is a strong candidate to push below your product line. A layer that ingests across providers, maps every source into one canonical schema, keeps provenance intact, and hands you a single comparable value per metric is the boundary that lets your team build on health data instead of perpetually translating it. (It’s the layer we work on at Sahha.) However you draw that line, the goal is the same: turn a 42 from Oura and a 38 from Apple into a number you actually understand.

References

Empirical Health. (2025). How different wearables measure HRV: SDNN vs RMSSD. https://www.empirical.health/blog/how-wearables-measure-hrv/
Thryve. (2026). How to Integrate Multiple Wearable APIs — A Complete Developer Guide. https://www.thryve.health/blog/wearable-api-integration-guide-for-developers
Terra. (2025). How HRV Actually Works. https://tryterra.co/research/how-hrv-actually-works
Open mHealth. Schema Library. https://www.openmhealth.org/documentation/#/schema-docs/schema-library
Frontiers in Digital Health. (2025). Streamlining wearable data integration for EHDS: a case study on advancing healthcare interoperability using Garmin devices and FHIR. https://doi.org/10.3389/fdgth.2025.1636775