When presented with a series of events, many developers will first be tempted to sort the events by timestamp. This is dangerous because timestamps do not provide the strict ordering they might assume.
Out of order events can lead to infrequent but significant bugs: consider "add to basket then checkout" vs "checkout then add to basket".
Instead of timestamps, developers should prefer simple counters and proper conflict detection. Timestamps may still be useful, but should be approached with caution due to the complexities outlined below.
Time resolution is not infinite
What happens when two events have the same timestamp?
When writing to a log file, two entries with the same timestamp are not a problem because the lines in the log file provide the real order of the data. However, import those entries into a database and the original line order is lost. Now, when sorted by timestamp, two entries with the same value may be returned in an undefined order.
The chances of a duplicate timestamp are affected by:
- The resolution of the hardware clock and time APIs - running your code on another platform may significantly increase your chance of duplicates.
- The resolution of your timestamps - do you store seconds since epoch (resolution 1 second)? nanoseconds? a date string with HH:MM (resolution 1 minute)?
- The frequency of events
Events in close proximity may record identical timestamps causing them to appear shuffled. To make matters worse, this is most likely when a machine is under heavy load.
Clocks can go backwards
If, like me, you experience time in one direction, this is easy to forget. A clock is merely a device to measure time and as such requires calibration and adjustment.
Manual adjustments, like when a user naively changes timezone or corrects a slow clock, are the most likely cause of a jump backwards in time, but automatic changes can also be to blame.
If a developer generates timestamps or stores timezone data incorrectly, the automatic change from daylight saving time could jump events backwards by a whole hour. We have to be particularly careful in the UK, where GMT can happily masquerade as UTC for half the year.
Services like ntpd (Network Time Protocol Daemon) can also cause dramatic clock changes. Depending on configuration, a large drift in system time can cause ntpd to hop immediately to the correct time (possibly backwards). Devices like the Raspberry Pi are particularly vulnerable to this as they are frequently disconnected from a network and have no Real Time Clock.
There are clocks guaranteed to never run backwards, called 'monotonic' clocks, but a timestamp from a monotonic clock is often of little use between reboots, and useless to compare between machines. Generally, a monotonic clock is used to measure a time interval on a single machine.
Intervals can stretch and shrink
Jumps in time can cause problems, so services like ntpd often prefer to slow down or speed up the system clock until it gradually approaches the correct time (this is called 'slew' correction).
Google uses a similar approach for leap seconds, 'smearing' an extra second over a 24 hour period, instead of bamboozling software with a 61 second minute.
Even if you could start a timer on multiple machines at a known instant in time and stop them at another instant, they would likely measure a subtly different elapsed time. The longer the interval, the more apparent manufacturing tolerances will be. As an example, Adafruit advises this PCF8523 based RTC "may lose or gain a second or two per day".
Clocks are never in sync
A developer may be attracted to timestamps because they're easy to collect at multiple sites then insert into an ordered collection later. However, in addition to all of the above, they must now consider the disparity between multiple system clocks.
Replying to a chat message on one machine you might easily record a timestamp before the original if the original was recorded at a different machine.
Timestamps are complex. They're difficult to store and generate correctly, they're almost impossible to compare accurately across machines, and they cannot guarantee a strict causal ordering of events.
When you sort data by timestamp it almost always implies a causal relationship (e.g. implying a message happened before it's reply, or a form GET happened before a POST). Because of this, techniques that provide a strict (or at least causal) ordering of events should be preferred.
Use a counter
The most fool-proof alternative to timestamps is an incremental counter stored on a single machine. If there is only one instance of the software, or clients always submit to a central server, this is often the best choice.
Most databases provide an auto increment or sequence type that can provide a suitable value.
Consider distributed clocks
If you need to generate points in a sequence at multiple sites, then you may need a more complex series of counters like Lamport timestamps or a vector clock. Distributed clocks like this provide a partial causal ordering of events and a means to detect conflicts (i.e. events that are seen as concurrent because they extend a shared point in history).
If your clients generate timestamps locally but the data is only integrated by a central server (not shared peer-to-peer), your logical clock can be relatively simple requiring only two peers.
Distributed clocks will only help you detect concurrent events. Once detected, the problem of resolving conflicting events is often domain-specific. Using the appropriate clock or data structure should force you to handle these conflicts early on. Remember, the conflicts were always present with regular timestamps, they were just not being surfaced in your design.
Conflict detection and resolution can get as fancy and as complicated
as you like, including employing tools
git to provide a full history. That
said, it's so hard to imagine an architecture that started with simple
timestamps and ended with
git, that I'm going to suggest you try a
distributed clock or simple counter first.
When are timestamps appropriate?
I'm only suggesting timestamps are a bad way to order causally linked events. Timestamps are still useful for:
- Communication with humans - Logical clocks don't mean a lot to us. Adding a timestamp as part of the presentation (but not ordering) of data is often a good idea as it lets us place entries in a wider context outside of a single application.
- Sampling - Data collected for statistical analysis is often collected ad-hoc from multiple sources and strictly ordering measurements in close proximity may not be important. Ask yourself: "If I shuffled a few events around would my conclusions still be sound?"
A frequent bugbear
I'm recording my arguments against ordering by timestamp here as a reference because it's a conversation I frequently have in architecture meetings. I hope this is a useful reference for you too, and if you have any relevant experience please do share it with me.