Data Retention Strategy
1. Introduction
In the Data Archiving section, we introduced the Data Process Strategy, mentioning that EdgeHub retains various types of data in the database, including:
- Real-time (RT) Data
- RAW Data
- Recording Rate
- Hour
- Day
In this document, we will delve into more detailed data retention rules and timing.
2. Data Retention Strategy
EdgeHub's data services comprise:
- Data Worker
- Processes raw data uploaded from edge devices to the Message broker (RabbitMQ / Azure IoT Hub)
- Retains real-time data (Realtime(RT) data)
- Retains raw historical data (RAW Data)
- Notifies Data Archiver for historical data computation
- Notifies Data Restore of out-of-date data (if any) for data recalculation
- Data Archiver
- Handles historical data computation for parameters and retains computation results at different time precisions
- Retains Recording Rate data (Rate set in parameters)
- Retains Hour data
- Retains Day data
- Data Restore
- Receives notifications of out-of-date data and performs historical data recomputation
- Computes Recording Rate data
- Computes Hour data
- Computes Day data
- Data Packer
- Regularly backs up cold data in MongoDB to Blob
- Notifies Data Cleaner to delete backed-up data after completion
- Periodically deletes cold data in Blob exceeding the retention limit
- Data Cleaner
- Receives deletion commands from Data Packer and deletes backed-up data from MongoDB

From the above description:
- When data in MongoDB is retained for a certain period, it is identified as cold data, triggering the backup mechanism to move the data to Blob.
- After the data migration is complete, MongoDB data is deleted.
- When data in Blob exceeds the configured maximum retention date (refer to the Data Archiving section), it is removed from Blob.
2.1 Hot Data Kept in MongoDB
Based on different data types, here are the current settings for the retention time of hot data (kept in MongoDB):
- RAW Data: Retained for 5 days
- Recording Rate Data: Retained for 7 days
- Hour Data: Retained for 45 days
- Day Data: Retained for 45 days
As long as data is within the mentioned timeframes, it will be retained in the MongoDB collection.
2.2 Time to Backup Hot Data into Blob
To ensure that hot data is backed up to Blob by the time its retention period is reached, EdgeHub's Data Packer service initiates data backup early. Here are the current backup times:
- RAW Data: Older than 3 days
- Recording Rate Data: Older than 5 days
- Hour Data: Older than 43 days
- Day Data: Older than 43 days
Special Note: Because RAW Data older than 3 days will be backed up to Blob, there is currently no mechanism to modify cold data in Blob. Therefore, if devices require to restore the out-of-date data, (refer to the explanation in Section 4, Restore Rules), it needs to be completed within 3 days.
2.3 Data Kept Days
Once data is moved to Blob, it becomes cold data, and it will only be deleted when the user-configured data retention period expires. For detailed data retention settings, please refer to the explanation in the Data Archiving document.
2.4 Example
- Assuming the
days to keep object datafor object A is set to 100 days:- RAW Data in MongoDB:
- Day 0 to Day -5
- RAW Data in Blob:
- Day -4 to Day -31 (Day -4 is because it's older than 3 days, so it's backed up to Blob early)
- Recording Rate Data in MongoDB:
- Day 0 to Day -7
- Recording Rate Data in Blob:
- Day -6 to Day -100 (Day -6 is because it's older than 5 days, so it's backed up to Blob early)
- Hour/Day Data in MongoDB:
- Day 0 to Day -45
- Hour/Day Data in Blob:
- Day -44 to Day -731 (Day -44 is because it's older than 3 days, so it's backed up to Blob early)
- RAW Data in MongoDB:
- Assuming the
days to keep object datafor object B is set to 7 days:- RAW Data in MongoDB:
- Day 0 to Day -5
- RAW Data in Blob:
- Day -4 to Day -7 (Day -4 is because it's older than 3 days, so it's backed up to Blob early)
- Recording Rate Data in MongoDB:
- Day 0 to Day -7
- Recording Rate Data in Blob:
- None
- Hour/Day Data in MongoDB:
- Day 0 to Day -7
- Hour/Day Data in Blob:
- None
- RAW Data in MongoDB:
- Assuming the
days to keep object datafor object C is set to 100 days, and the tenant has configured an External Blob:- RAW Data in MongoDB:
- Day 0 to Day -5
- RAW Data in Blob:
- Day -4 to Day -100 (Day -4 is because it's older than 3 days, so it's backed up to Blob early)
- Recording Rate Data in MongoDB:
- Day 0 to Day -7
- Recording Rate Data in Blob:
- Day -6 to Day -100 (Day -6 is because it's older than 5 days, so it's backed up to Blob early)
- Hour/Day Data in MongoDB:
- Day 0 to Day -45
- Hour/Day Data in Blob:
- Day -44 to Day -731 (Day -44 is because it's older than 3 days, so it's backed up to Blob early)
- RAW Data in MongoDB: