Sahha Research
How we handle sleep data in our technology
Sahha expertly and efficiently tackles multiple challenges, from collecting critical sleep data logs from multiple sources to delivering accurate biomarkers and scores. This makes it a hassle-free experience for businesses integrating our offerings.
Quality sleep is an important factor in improving mental health. According to research (1), good sleep promotes resilience to mental health issues such as depression and anxiety. Adequate sleep helps the brain in processing emotional information which is vital for emotional regulation and mental well-being.
Efficient Sleep Data Collection and Integration
Harnessing Diverse Device Capabilities
Different devices have unique methods of tracking sleep:
Sensors and Algorithms: Mobile devices and wearables track sleep using a combination of sensors and algorithms. These devices commonly use accelerometers to detect movement which helps to determine sleep and wake periods. Additionally, they may incorporate heart rate monitors. In some cases, measure oxygen levels to provide more comprehensive sleep data.
Device Placement: Even the same device with the same sensors can provide different readings depending on its placement. Wearables are generally more accurate because they are worn directly on the body, whereas mobile phones placed on a mattress or beside the bed can only provide basic movement data and are incapable of accurately recording sleep stages.
Device Comfort: A sound sleep can only be achieved when a person is in their most comfortable state. Good comfort for a wearable is optimal as it is ensures that it doesn’t affect the sleep quality.
Sleep Data Variety
We primarily capture sleep stages to understand sleep quality. Determining accurate sleep stages is crucial for understanding mental and physical wellbeing:
Different Sleep Stages: Sleep is divided into various stages including (2) -
In Bed - The total time spent in bed, including all sleep stages and periods of wakefulness. It is the entire duration from when a person lies down to sleep until they get out of bed in the morning.
Awake - Periods during the night when an individual is conscious and aware of their surroundings, including brief awakenings that may not be remembered. It can occur between sleep cycles or within sleep stages and is often short.
Asleep - The state in which the body and mind are in a restful condition, characterized by reduced consciousness and physical activity. It includes all the stages of sleep during the sleep cycle.
Rapid Eye Movement or REM - REM sleep, a sleep stage characterized by vivid dreaming, rapid movement of the eyes, increased brain activity, and temporary muscle paralysis. It plays a crucial role in emotional regulation, memory consolidation, and brain development.
Light Sleep - The stages of sleep where the body transitions from wakefulness to deeper sleep. It is easier to wake up during these stages. It is important for mental and physical recovery, although it is not as restorative as deep sleep.
Deep Sleep - It is characterized by slow brain waves, reduced heart rate, and the most restorative phase of sleep. Crucial for physical recovery, growth, and immune function
The sleep stages are ordered in ascending order of their overall restorative effect on the body.
It's important to note that many people often utilize multiple devices simultaneously during sleep, some use smartphones and others use wearables to track their sleep. Sahha handles both choices.
Mobile vs. Wearables: Mobile phones are limited to detecting basic sleep patterns through movement. In contrast, wearables can detect sleep stages by monitoring physiological signals such as heart rate variability and blood oxygen levels, providing a comprehensive view of sleep quality.
Streamlined Data Ingestion and Cleanup
Real-time Sleep Data Integration
Sleep data from multiple devices is ingested into sleep table. Each row is differentiated by source of collection and stage in a uniform format designed to ensure data uniformity and scalability. This table stores hundreds of millions of rows from tens of thousands of users. Developers are expected to send sleep data following our schema, which promotes scalability, maintainability, data uniformity, and standardization.
Ensuring Data Quality through Cleanup
Pre-filtering: Data points from source tables are filtered on a preset window size to ensure enough data is used for creating features, biomarkers, and scores while maintaining pipeline performance.
Trivial De-duplication: Events with the same parameters (profile ID, timings, stage, and value) are de-duplicated to prevent recounting the same data point during aggregations.
Complex De-duplication:
Stage Overlaps:
REM, Light and Deep sleep events are merged on overlaps other event of the same sleep stage.
Asleep event is merged with other Asleep, REM, Light and Deep stages.
In Bed event is merged with other In Bed, Awake, Asleep, REM, Light and Deep stages.
The full sleep session is generated by taking overlaps between In Bed (for mobiles) or Asleep (for wearables) events such that a gap of less than an hour is tolerated for merging. This ensures that longer awake stage events are considered. If the gap is larger than an hour then it results in multiple sleep sessions.
Device Selection: The device providing the most consistent and comprehensive data is chosen, typically favoring wearables over mobiles for detailed sleep stage information.
Time Zone Parity: If a user changes time zones mid-sleep, the timings are transformed to the time zone where the sleep session started
Inference Tagging: Each profile is tagged with an inference ID that tracks a unique inference daily for a given batch duration.
Advanced Data Aggregation for Complex Features
Data Processing through Aggregation
Aggregation converts raw data into actionable features. We leverage aggregation to accurately combine data from different sources, time scales, and units.
Types of Aggregation:
Temporal Aggregation: Grouping data by time intervals. This method is useful for identifying trends that change over time, such as nightly sleep duration or hourly REM sleep periods.
Spatial Aggregation: Grouping data by geographical regions over a set period. This helps understand trends influenced by geographical parameters, such as sleep patterns in different locations.
We will not focus on Spatial Aggregation from here on because -
Just having temporal aggregation is enough for understanding most trends in sleep activity.
Spatial data such as the user’s residence, office location, etc. can be a security concern and it should be avoided until necessary. This is also covered in the data compliance section.
Multi-layered Aggregation
Aggregation is handled through multiple layers to evolve raw data into complex features:
Hourly Aggregation: Sleep events are distributed for each hour, which can be further aggregated.
Daily Aggregation: Daily features created from sleep events include -
In Bed Duration
Awake Duration
Asleep Duration
REM Sleep Duration
Light Sleep Duration
Deep Sleep Duration
Sleep Duration - Total duration of full sleep session
Sleep Start Time - Start of the full sleep session
Sleep Mid Time - Mid point of the full sleep session
Sleep End Time - End of the full sleep session
Sleep Latency - The time it takes to transition from In Bed to Asleep.
Sleep Efficiency - Ratio of asleep duration and in bed duration which is useful to figure out sleep quality strictly in terms of duration and wakefulness.
Sleep Interruptions - Number of awake events between the full sleep session.
Weekly Aggregation: Useful for machine learning models, weekly features include average nightly sleep duration, total REM sleep, and more.
Average In Bed Duration
Average Awake Duration
Average Asleep Duration
Average REM Sleep Duration
Average Light Sleep Duration
Average Deep Sleep Duration
Sleep Start Regularity - Regularity of sleep start time with respect to the ideal sleep time throughout the week
Sleep End Regularity - Regularity of sleep end time with respect to the ideal wake up time throughout the week
Sleep Regularity Index - Quantification of sleep consistency by comparing sleep and wake hours across days in a week. It rewards regularity and penalizes irregularity of sleep.
Sleep Routine - Quantifies the average deviation between ideal and actual sleep times.
Sleep Debt - Total deficit of sleep hours from the necessary amount of sleep required for optimal health recovery.
Generating Valuable Sleep Biomarkers and Scores
Aggregated data is used to generate biomarkers and scores:
Biomarkers: Indirect indicators of sleep quality, such as sleep duration or sleep latency. Biomarkers help users track their sleep trends. Any aggregation can be chosen as a biomarker if it is viable enough, letting users notice their trend is quite useful especially for more health minded people. A biomarker can either be a singular aggregation or can be a mixture of multiple aggregations.
Scores: Built on aggregated data using scoring functions. These scores provide direct references to user performance in various aspects, utilizing multiple aggregations.
Developers can choose specific offerings as per their requirements. For example, if your use case is specifically bound to the user’s activity duration then you can pick the duration biomarker, score or both. This provides a way to reduce data footprint, data overhead and architecture costs.
Data Privacy and Compliance
Ensuring data privacy is critical for handling personal data safely and confidentially. Data privacy and compliance are vital for maintaining trust and reputation:
No Personal Data: The data warehouse does not contain personal or identifiable data. Only the user’s profile ID, which maps to data in a separate database, is used.
Data Anonymization: Data points are tagged with a UUID, preventing identification of the actual person.
End-to-End Encryption: The ELT pipeline is encrypted at all stages. Clients must comply with our encryption standards.
Strict Authorization: Only specific people and services can access the data. Modifications are strictly controlled unless requested by the user such as changing their input data or removing their profile
Conclusion
Sleep events undergo comprehensive data processing to ensure clean and standardized data. The cleaned data is then temporally aggregated to produce biomarkers and scores while adhering to strict data privacy practices.
References
Emma C. Sullivan, Emma James, Lisa-Marie Henderson, Cade McCall, Scott A. Cairney. The influence of emotion regulation strategies and sleep quality on depression and anxiety. Cortex, 2023; 166: 286 DOI: 10.1016/j.cortex.2023.06.001
Carskadon, M. A., & Dement, W. C. (2017). Normal Human Sleep: An Overview. In M. Kryger, T. Roth, & W. C. Dement (Eds.), Principles and Practice of Sleep Medicine (6th ed., pp. 15-24). Elsevier.
Sahha Research
How we handle sleep data in our technology
Sahha expertly and efficiently tackles multiple challenges, from collecting critical sleep data logs from multiple sources to delivering accurate biomarkers and scores. This makes it a hassle-free experience for businesses integrating our offerings.
Quality sleep is an important factor in improving mental health. According to research (1), good sleep promotes resilience to mental health issues such as depression and anxiety. Adequate sleep helps the brain in processing emotional information which is vital for emotional regulation and mental well-being.
Efficient Sleep Data Collection and Integration
Harnessing Diverse Device Capabilities
Different devices have unique methods of tracking sleep:
Sensors and Algorithms: Mobile devices and wearables track sleep using a combination of sensors and algorithms. These devices commonly use accelerometers to detect movement which helps to determine sleep and wake periods. Additionally, they may incorporate heart rate monitors. In some cases, measure oxygen levels to provide more comprehensive sleep data.
Device Placement: Even the same device with the same sensors can provide different readings depending on its placement. Wearables are generally more accurate because they are worn directly on the body, whereas mobile phones placed on a mattress or beside the bed can only provide basic movement data and are incapable of accurately recording sleep stages.
Device Comfort: A sound sleep can only be achieved when a person is in their most comfortable state. Good comfort for a wearable is optimal as it is ensures that it doesn’t affect the sleep quality.
Sleep Data Variety
We primarily capture sleep stages to understand sleep quality. Determining accurate sleep stages is crucial for understanding mental and physical wellbeing:
Different Sleep Stages: Sleep is divided into various stages including (2) -
In Bed - The total time spent in bed, including all sleep stages and periods of wakefulness. It is the entire duration from when a person lies down to sleep until they get out of bed in the morning.
Awake - Periods during the night when an individual is conscious and aware of their surroundings, including brief awakenings that may not be remembered. It can occur between sleep cycles or within sleep stages and is often short.
Asleep - The state in which the body and mind are in a restful condition, characterized by reduced consciousness and physical activity. It includes all the stages of sleep during the sleep cycle.
Rapid Eye Movement or REM - REM sleep, a sleep stage characterized by vivid dreaming, rapid movement of the eyes, increased brain activity, and temporary muscle paralysis. It plays a crucial role in emotional regulation, memory consolidation, and brain development.
Light Sleep - The stages of sleep where the body transitions from wakefulness to deeper sleep. It is easier to wake up during these stages. It is important for mental and physical recovery, although it is not as restorative as deep sleep.
Deep Sleep - It is characterized by slow brain waves, reduced heart rate, and the most restorative phase of sleep. Crucial for physical recovery, growth, and immune function
The sleep stages are ordered in ascending order of their overall restorative effect on the body.
It's important to note that many people often utilize multiple devices simultaneously during sleep, some use smartphones and others use wearables to track their sleep. Sahha handles both choices.
Mobile vs. Wearables: Mobile phones are limited to detecting basic sleep patterns through movement. In contrast, wearables can detect sleep stages by monitoring physiological signals such as heart rate variability and blood oxygen levels, providing a comprehensive view of sleep quality.
Streamlined Data Ingestion and Cleanup
Real-time Sleep Data Integration
Sleep data from multiple devices is ingested into sleep table. Each row is differentiated by source of collection and stage in a uniform format designed to ensure data uniformity and scalability. This table stores hundreds of millions of rows from tens of thousands of users. Developers are expected to send sleep data following our schema, which promotes scalability, maintainability, data uniformity, and standardization.
Ensuring Data Quality through Cleanup
Pre-filtering: Data points from source tables are filtered on a preset window size to ensure enough data is used for creating features, biomarkers, and scores while maintaining pipeline performance.
Trivial De-duplication: Events with the same parameters (profile ID, timings, stage, and value) are de-duplicated to prevent recounting the same data point during aggregations.
Complex De-duplication:
Stage Overlaps:
REM, Light and Deep sleep events are merged on overlaps other event of the same sleep stage.
Asleep event is merged with other Asleep, REM, Light and Deep stages.
In Bed event is merged with other In Bed, Awake, Asleep, REM, Light and Deep stages.
The full sleep session is generated by taking overlaps between In Bed (for mobiles) or Asleep (for wearables) events such that a gap of less than an hour is tolerated for merging. This ensures that longer awake stage events are considered. If the gap is larger than an hour then it results in multiple sleep sessions.
Device Selection: The device providing the most consistent and comprehensive data is chosen, typically favoring wearables over mobiles for detailed sleep stage information.
Time Zone Parity: If a user changes time zones mid-sleep, the timings are transformed to the time zone where the sleep session started
Inference Tagging: Each profile is tagged with an inference ID that tracks a unique inference daily for a given batch duration.
Advanced Data Aggregation for Complex Features
Data Processing through Aggregation
Aggregation converts raw data into actionable features. We leverage aggregation to accurately combine data from different sources, time scales, and units.
Types of Aggregation:
Temporal Aggregation: Grouping data by time intervals. This method is useful for identifying trends that change over time, such as nightly sleep duration or hourly REM sleep periods.
Spatial Aggregation: Grouping data by geographical regions over a set period. This helps understand trends influenced by geographical parameters, such as sleep patterns in different locations.
We will not focus on Spatial Aggregation from here on because -
Just having temporal aggregation is enough for understanding most trends in sleep activity.
Spatial data such as the user’s residence, office location, etc. can be a security concern and it should be avoided until necessary. This is also covered in the data compliance section.
Multi-layered Aggregation
Aggregation is handled through multiple layers to evolve raw data into complex features:
Hourly Aggregation: Sleep events are distributed for each hour, which can be further aggregated.
Daily Aggregation: Daily features created from sleep events include -
In Bed Duration
Awake Duration
Asleep Duration
REM Sleep Duration
Light Sleep Duration
Deep Sleep Duration
Sleep Duration - Total duration of full sleep session
Sleep Start Time - Start of the full sleep session
Sleep Mid Time - Mid point of the full sleep session
Sleep End Time - End of the full sleep session
Sleep Latency - The time it takes to transition from In Bed to Asleep.
Sleep Efficiency - Ratio of asleep duration and in bed duration which is useful to figure out sleep quality strictly in terms of duration and wakefulness.
Sleep Interruptions - Number of awake events between the full sleep session.
Weekly Aggregation: Useful for machine learning models, weekly features include average nightly sleep duration, total REM sleep, and more.
Average In Bed Duration
Average Awake Duration
Average Asleep Duration
Average REM Sleep Duration
Average Light Sleep Duration
Average Deep Sleep Duration
Sleep Start Regularity - Regularity of sleep start time with respect to the ideal sleep time throughout the week
Sleep End Regularity - Regularity of sleep end time with respect to the ideal wake up time throughout the week
Sleep Regularity Index - Quantification of sleep consistency by comparing sleep and wake hours across days in a week. It rewards regularity and penalizes irregularity of sleep.
Sleep Routine - Quantifies the average deviation between ideal and actual sleep times.
Sleep Debt - Total deficit of sleep hours from the necessary amount of sleep required for optimal health recovery.
Generating Valuable Sleep Biomarkers and Scores
Aggregated data is used to generate biomarkers and scores:
Biomarkers: Indirect indicators of sleep quality, such as sleep duration or sleep latency. Biomarkers help users track their sleep trends. Any aggregation can be chosen as a biomarker if it is viable enough, letting users notice their trend is quite useful especially for more health minded people. A biomarker can either be a singular aggregation or can be a mixture of multiple aggregations.
Scores: Built on aggregated data using scoring functions. These scores provide direct references to user performance in various aspects, utilizing multiple aggregations.
Developers can choose specific offerings as per their requirements. For example, if your use case is specifically bound to the user’s activity duration then you can pick the duration biomarker, score or both. This provides a way to reduce data footprint, data overhead and architecture costs.
Data Privacy and Compliance
Ensuring data privacy is critical for handling personal data safely and confidentially. Data privacy and compliance are vital for maintaining trust and reputation:
No Personal Data: The data warehouse does not contain personal or identifiable data. Only the user’s profile ID, which maps to data in a separate database, is used.
Data Anonymization: Data points are tagged with a UUID, preventing identification of the actual person.
End-to-End Encryption: The ELT pipeline is encrypted at all stages. Clients must comply with our encryption standards.
Strict Authorization: Only specific people and services can access the data. Modifications are strictly controlled unless requested by the user such as changing their input data or removing their profile
Conclusion
Sleep events undergo comprehensive data processing to ensure clean and standardized data. The cleaned data is then temporally aggregated to produce biomarkers and scores while adhering to strict data privacy practices.
References
Emma C. Sullivan, Emma James, Lisa-Marie Henderson, Cade McCall, Scott A. Cairney. The influence of emotion regulation strategies and sleep quality on depression and anxiety. Cortex, 2023; 166: 286 DOI: 10.1016/j.cortex.2023.06.001
Carskadon, M. A., & Dement, W. C. (2017). Normal Human Sleep: An Overview. In M. Kryger, T. Roth, & W. C. Dement (Eds.), Principles and Practice of Sleep Medicine (6th ed., pp. 15-24). Elsevier.