Dispatching #
There are two ways of dispatching events: pipes and reducers. Both these entities are configured with a timestamp inheritance and backfill merge mode. These determine how the timestamp of the dispatched event is assigned during backfill or at the head.
inherit timestamp | backfill merge | backfill | head |
---|---|---|---|
yes | concat | destination | parent |
slice | parent | parent | |
no | concat | destination | destination |
slice | parent | destination |
Backfill merge #
During backfill, events dispatched by the reducer or pipe can be concatenated or spliced into the destination stream.
concat
appends events dispatched during backfill to the end of the destination stream. If the event is timely, it will be successfully appended; otherwise, it will be rejected (and the rejection logged on the origin stream).
splice
creates a splice request and queues up all events in the splice request. Once it has caught up to the stream head, future dispatched events are accepted/rejected based on normal timeliness rules. The splice request can be merged or rejected by the stream owner. If merged, the stream is reindexed and reprocessed with the merged events.
Timestamp inheritance #
By default, dispatched events are not assigned a timestamp. Instead, they are assigned a timestamp during ingestion. However, both reducer and pipe dispatches can be configured to inherit the timestamp of the parent event. This is only recommended when the reducer or pipe is the only event source for the destination stream.
This can be dangerous anytime the destination stream has other event sources and by the time the dispatched event arrives, the destination stream’s timestamp has advanced and the new event will be rejected.
Reducer processing speed can create data races that can lead to dispatched events being sporadically rejected. Dispatches are have guaranteed order per destination stream (i.e. events dispatched by a reducer at index 0 will always be dispatched before events dispatched by the same reducer at index 1). This adds a trivial but O(n) latency per dispatch, which means that other streams dispatching to the same destination stream may increment the destination stream’s timestamp and reject the dispatched events.
Even without data races, if your reducer encounters an error and is paused until a fix is deployed, it and its dispatched events will be considerably behind the wall-clock time. Unless the reducer is the only source of events, it’s likely that while the reducer is catching up with the stream head, any dispatched events will be rejected.
While there are cases where assigning timestamps is useful, overall, timestamp should be considered a system property for ordering events and not a property of the event itself. In addition, consistency with third party data sources is impossible to guarantee and events may arrive out of order. If strict ordering is required, it’s best to setup a “buffer” stream that awaits for events to arrive, re-orders them within a reducer’s state, and then dispatches them to the finalized stream.
Substreams #
Substream piping should almost never inherit timestamps. Dispatches between streams are not atomic and therefore can arrive out of order.
Dispatch errors #
Dispatches are not connected to the reducer runtime and rejected dispatches will not cause reducers to pause.