A user connects an Oura Ring and an Apple Watch, and your app pulls HRV from both. Oura says 42. Apple says 38. A reasonable engineer averages them to 40 and moves on — and ships a bug. Those two numbers aren’t measuring the same thing: Apple reports SDNN, Oura reports RMSSD, computed over different time windows. Averaging them is like averaging a temperature in Celsius with one in Fahrenheit [1].
This is the normalization problem, and it’s distinct from — and upstream of — deduplication. Deduplication asks “is this the same event counted twice?” Normalization asks the more basic question: “do these two records even mean the same thing?” Until the answer is yes, every downstream score, trend, and comparison is built on sand.
This guide covers what normalization actually involves across wearable providers, the four layers where data diverges, and a strategy that holds up as you add sources.
Normalization vs deduplication
The two get conflated, but they solve different problems and run in a specific order:
- Normalization aligns representation — converting Garmin’s
total_sleep_seconds, Oura’ssleep_duration(minutes), and Apple’sHKCategoryValueSleepAnalysissamples into one canonical field with one unit and one meaning [2]. - Deduplication aligns records — reconciling the same night of sleep, recorded by two devices, into a single truthful value.
You normalize first. Deduplication compares records to decide which to keep, and you can’t compare records that don’t share units, field names, and semantics. Normalize, then dedup — the reverse is one of the most common and most painful integration mistakes.
The four layers of mismatch
Wearable data diverges at four levels, from the superficial to the subtle. Easy ones are mechanical; the dangerous ones are semantic, because they look aligned when they aren’t.
1. Units and field names (syntactic)
The most visible layer, and the easiest to fix. The same fact arrives with different names and units:
| Metric | Garmin | Oura | Apple HealthKit |
|---|---|---|---|
| Sleep duration | total_sleep_seconds (s) | sleep_duration (min) | HKCategoryValueSleepAnalysis (enum samples) |
| Distance | meters | meters | meters or km by locale |
| Temperature | °C deviation | °C deviation | °C |
| Energy | kilocalories | — | kcal or kJ |
A normalization layer converts everything to one base unit per metric (seconds, meters, °C, kcal) and one field vocabulary. Mechanical, but unforgiving: a seconds-vs-minutes slip silently inflates sleep 60×.
2. Metric semantics (the dangerous layer)
Here the field name matches but the definition doesn’t. Two providers both call a field “HRV,” and the numbers are not comparable [1][3]:
- Apple Watch reports SDNN; Oura, WHOOP, Fitbit, Samsung report RMSSD. Different formulas over different beat-to-beat windows.
- WHOOP computes HRV during your deepest sleep; Oura averages 5-minute samples across the whole night; Apple samples opportunistically, including during Mindfulness sessions [1].
The same is true of sleep stages: what one device labels “light” sleep overlaps differently with another’s “core,” and stage definitions vary by vendor — which is exactly why the same night looks different on an Apple Watch and an Oura Ring. Normalizing the number without normalizing the meaning produces values that line up in a database and lie to your users.
3. Data shape and granularity
Providers model time differently. Oura prioritizes daily summaries; Polar and others expose session-level detail; raw streams arrive at different sampling frequencies [2]. Some providers push via webhooks; others require polling. Normalizing means deciding on a canonical granularity (e.g., per-day and per-session records) and reshaping each source into it — aggregating where a source is too granular, and accepting coarser resolution where it only offers summaries.
4. Time, timezone, and identity
The quietest layer, and a frequent source of production incidents. Timestamps arrive in epoch seconds, ISO-8601, or local time; a record’s “day” depends on the user’s timezone at the moment of capture; and each provider has its own user, device, and source identifiers. Normalize timestamps to a single convention (UTC plus an explicit offset), and carry a consistent identity and provenance for every record — you’ll need both for deduplication later.
A normalization strategy that holds up
The same framework works whether you support two providers or twenty.
1. Define a canonical schema
Pick one target model and make every source conform to it: one unit, one field name, and one type per metric. You can adopt an existing standard — Open mHealth schemas are designed for mobile and wearable data, while FHIR is clinically oriented and mapping consumer fitness metrics onto it adds complexity for little benefit [4][5]. Or define your own internal model. What matters is that it’s singular and explicit.
2. Write a per-provider mapping layer
For each provider, a thin adapter converts units, renames fields, and maps enumerations (sleep-stage taxonomies, activity types) into the canonical vocabulary. Isolate this per source so that when a vendor changes its API — and they do — the blast radius is one adapter, not your whole pipeline.
3. Normalize semantics, not just syntax
This is the step that separates a real normalization layer from a rename script. Where a metric isn’t comparable across sources (HRV’s SDNN vs RMSSD, divergent sleep-stage definitions), don’t force it into one number. Tag each value with its source and method, and either expose the metric type or commit to a single source per user per metric. Aligning the schema is necessary; aligning the meaning is what makes the data trustworthy.
4. Preserve provenance and method
Never discard which source and which algorithm produced a value. You need it to flag non-comparable metrics, to weight sources, and — critically — to deduplicate afterward. Normalization that throws away provenance to produce a “clean” value destroys the information the next stage depends on.
5. Normalize before you deduplicate
With every source mapped into the canonical schema, deduplication and reconciliation finally have aligned records to work on. Run them on the normalized set, not the raw one.
The edge cases that will bite you
- Enum drift. A vendor renames or adds a sleep stage in a minor API update, and your mapping silently drops the new value into a default bucket. Validate against an allowlist and alert on unknowns.
- Unit ambiguity by locale. Distance and energy can switch units based on account or device locale. Don’t assume a unit — read it from the payload or pin it per provider.
- Missing vs zero. A provider that omits a field is not reporting zero. Collapsing “no data” into 0 corrupts averages and trends. Preserve null.
- Timezone-dependent days. Aggregate in the wrong zone and one night’s sleep splits across two calendar days, manufacturing both a gap and a duplicate.
- Schema version skew. Two users on different app or firmware versions can return different shapes from the “same” provider. Version your adapters.
What this means for builders
Normalization is the layer everything else stands on. A health score, a readiness trend, a cohort comparison — each silently assumes the inputs share units and meaning. When they don’t, the output isn’t noisy, it’s wrong in a way that looks right, which is the worst failure mode in a data product.
The honest assessment of the work: a two-provider integration is a weekend; the real cost is the long tail. Every new source is another adapter, every vendor API change is maintenance, and the semantic cases — HRV metric types, sleep-stage taxonomies, granularity mismatches — never fully stop demanding judgment. It’s also entirely undifferentiated: no user has ever chosen a product because its unit conversion was tidy. (For the market context behind this fragmentation, see the state of wearable health data.)
That combination — foundational, unglamorous, and ever-growing — is why normalization is a strong candidate to push below your product line. A layer that ingests across providers, maps every source into one canonical schema, keeps provenance intact, and hands you a single comparable value per metric is the boundary that lets your team build on health data instead of perpetually translating it. (It’s the layer we work on at Sahha.) However you draw that line, the goal is the same: turn a 42 from Oura and a 38 from Apple into a number you actually understand.
References
- Empirical Health. (2025). How different wearables measure HRV: SDNN vs RMSSD. https://www.empirical.health/blog/how-wearables-measure-hrv/
- Thryve. (2026). How to Integrate Multiple Wearable APIs — A Complete Developer Guide. https://www.thryve.health/blog/wearable-api-integration-guide-for-developers
- Terra. (2025). How HRV Actually Works. https://tryterra.co/research/how-hrv-actually-works
- Open mHealth. Schema Library. https://www.openmhealth.org/documentation/#/schema-docs/schema-library
- Frontiers in Digital Health. (2025). Streamlining wearable data integration for EHDS: a case study on advancing healthcare interoperability using Garmin devices and FHIR. https://doi.org/10.3389/fdgth.2025.1636775