Deletions
TileDB supports deletions in sparse arrays, such as “delete all cells that satisfy condition a > 10 AND d < 3
”, where a
could be an attribute and d
a dimension. Similar to writes, deletes are immutable and timestamped. However, due to the fact that all fragments created by previous writes are immutable, deletions cannot alter them. To address this, TileDB does not delete or rewrite any cells at the time the deletion is executed. Instead, it writes the delete condition in a special timestamped file. During a read operation (that can see the special delete file), TileDB processes the delete condition on-the-fly alongside any slicing and other query filter conditions (respecting also any time traveling conditions). This enables TileDB to guarantee atomicity and consistency in the presence of multiple concurrent writes and reads without locking.
On the downside, if you apply numerous delete conditions, read performance may be severely impacted. In such scenarios, you should run consolidation, which enables you to materialize the delete conditions as follows:
- Consolidate all cells, existing and deleted, into a new fragment. This fragment contains two special attributes, one that stores the timestamp that each cells was written, and one that stores the timestamp at which a cell was potentially deleted. The condition itself is stored in the fragment metadata of the consolidated fragment. This enables you to time travel with maximum fidelity in the presence of deletions.
- Consolidate all cells, but remove/purge the deletions. This will lead to the best storage and future read performance, but the deleted cells will be removed for good and you will not be able to locate them if you time travel to a timestamp before the deletion occurred.
The above two scenarios are handled via special configuration parameters in consolidation.