Inside a Collaborative Text Editor

Wednesday, 22 July 2020

Introducing the Editor

Collaborative editors are defined by the size and speed of their updates. On a website you might submit a form, but in a collaborative editor you can send a single character or key press.

A side-by-side demo of editing in Google Docs

Those tiny edits are shared quickly so you feel connected to your collaborators and can anticipate their actions. This experience is described as real-time editing.

Inside your editor, however, the frequent edits form a hotbed of conflicting updates. Solving or avoiding these conflicts is the real challenge of a collaborative text editor.


Taking turns, two authors might avoid a conflict - if they have the patience.

But our collaborative editor has many authors. To block one author's key press until another releases their keyboard would be infuriating. Instead, they each edit a copy of the document and submit changes when ready.

To submit, the first author can simply swap his copy for the original - with no prior updates, his edits apply cleanly.

The second author, however, will find - in place of the original - the first author's copy. Where she re-worded a sentence, he might have deleted the whole paragraph. She cannot replace the document without losing his work, so she must compare the two and decide how, and if, her edits still apply.

Most collaborative editors do this automatically, behind the scenes, for every change to a document.

Operational Transformation

The process of re-writing your edits to account for the work of others is called Operational Transformation. It's the most popular way to implement a collaborative editor and is used by Etherpad and Google Docs.

Concrete examples will follow, but for now let's start with two tubes: the kind you buy tennis balls in.

Each tube has a starting state - a yellow ball - and each is held by a different artist. At the same time, each artist adds a coloured ball to the tube and informs their partner.

Two tubes, the left showing a yellow ball and a blue ball stacked on top, the right showing a red ball in place of blue. Beside the tubes are two artists saying 'blue' and 'red' respectively.

If they apply their partner's update naively, their tubes will look different: red-blue-yellow one side and blue-red-yellow the other.

Two tubes, the left showing from top to bottom: red, blue, yellow, the right: blue, red, yellow.

No single tube is correct - the colours were added simultaneously - but the artists should agree a result.

To resolve the tie, they can invent an arbitrary rule. One that, when applied by both artists, will arrive at the same state. They could discard every ball… but the result is disappointing. How about: "when inserts are tied, insert red before blue"?

Two tubes, both showing from top to bottom: blue, red, yellow.

By applying the same rule their states converge. This is the principle behind Operational Transformation and it can be applied to many authors typing at once.

Handling text

In a text document, you can improve such a rule by examining the position of the text.

Tama under a blanket with the caption 'sleepy cat'.

Here's a photo of our cat Tama. I've labelled it "sleepy cat".

Tama suggesting "very sleepy cat". Yui suggesting "sleepy grumpy cat".

Tama and Yui (our other cat) both update my caption at different positions. When Tama receives Yui's edit, she'll need to shift Yui's "grumpy" right to account for her insert of "very".

Two editors with the same state: very sleepy grumpy cat. Tama has applied the transformed message 'INS "grumpy " 12'.

If she applied Yui's edit at its original character position, Tama would have the state "very slgrumpy eepy cat". But by understanding how their concurrent edits affect one another, Tama's editor can recover.

Unfortunately, we can't rely on position alone - what if they both insert text at the same location?

Tama on top of the blanket with the caption "Tama".

Another photo, labelled simply: "Tama".

Tama suggesting "happy Tama". Yui suggesting "playful Tama".

Tama and Yui both make an edit at the beginning - position 0.

Two editors showing a conflict. Tama has applied 'INS "playful " 0' and has the state "playful happy Tama". Yui has applied 'INS "happy " 0' and has the state "happy playful Tama".

Like our artists, Tama and Yui are tied for position. To resolve this, we invent another arbitrary rule and order the inserts by Tama and Yui's editor ID.

Two editors with the conflict resolved. Tama has applied the transformed message 'INS "playful " 6' and now also has the state "happy playful Tama".

Yui has ID [2], which is greater than Tama's [1], so Tama shifts Yui's insert after her own to converge on: "happy playful Tama".

We applied only two concurrent edits, but if Tama made multiple changes, she would need to transform Yui's insert for each - potentially shifting it multiple times before its final position.

I hope you feel confident after that, because - to demonstrate the risks - I'm going to present a puzzle that threw the early algorithms off-course.

False ties

A photo of Yui with the caption 'little cat'.

The puzzle begins with a third author. And, since we only have two cats - that's Yui above - I've introduced a fictional bunny: Bandit.

Tama suggesting "little loud cat". Bandit suggesting "cat". Yui suggesting "cute little cat".

Bandit doesn't think Yui is small and deletes the word 'little'. After applying all their edits, we expect the final caption to read: "cute loud cat".

But there's a problem - because Bandit deletes the word separating Tama and Yui's edits, a transform step might (falsely) tie them for position. I'll demonstrate by stepping through Tama and Bandit's transforms.

Tama applying 'INS "loud " 7', 'INS "cute " 0', and the transformed message 'DEL 5..12' to get the state "cute loud cat".

Tama arrives at the correct result "cute loud cat" because she applies Bandit's delete after her and Yui's inserts. Bandit, however, applies his delete first.

Bandit applying 'DEL 0..7', the transformed message 'INS "loud " 0', and the transformed message 'INS "cute " 0' to get the state "loud cute cat".

His delete shifts Tama's "loud" to position 0. Yui's edit now ties with Tama's, so Bandit must apply the arbitrary rule and order their inserts by editor ID - [1] Tama's "loud", followed by [2] Yui's "cute". Bandit incorrectly arrives at "loud cute cat".

If you didn't anticipate that you're in good company. Many academic papers on Operational Transformation (OT) suffered similar counter-examples (or bugs, to the practitioner) on publication. Thankfully these puzzles have now been solved, as evidenced by the widespread use of OT in the wild.

You can find the solution to our puzzle in the 30 years of OT research since 1989. But to keep your curiosity in check, consider the following: if an edit could be undone, conflicts could be rewound and replayed in a fixed order. It's more work, and could still cause a false tie, but at least everyone would encounter the same tie.

Hopefully you're now suitably paranoid. Operational Transformation is easy to explain but complex to implement. If only there was a way to avoid conflicts altogether…

Avoiding conflicts

Remember the two artists? I have another commission for them, this time using coloured liquid instead of balls.

They each start with a tube containing yellow dye, to which they add their own colours. Yellow and blue to make green; yellow and red to make orange.

A blue drop being added to the left tube to make green. A red drop being added to the right tube to make orange.

Finally, they inform each other and add their partner's dye to their own.

A red drop being added to the left tube to make a sort of grey colour. A blue drop added to the right tube to make the same grew colour.

What a lovely colour! - but at least they didn't have to apply any transforms to converge on it. Because the artists used liquid, the order they added the dye is irrelevant.

Using a data type with the same property for text we could eliminate conflicts there too.

Conflict-free replicated data types

Tama suggesting "very sleepy cat". Yui suggesting "sleepy grumpy cat".

One elegant data type, called LOGOOT (an example of a conflict-free replicated data type, or CRDT) is built upon each edit holding a unique position - represented by a list of integers.

"[10, 0, 1] SLEEPY, [11, 0, 2] CAT"

The list is comprised of: a positioning number, the author's editor ID, and a counter.

Every edit increments the author's counter. In combination with their editor ID, it ensures the position is unique. The positioning number places the text in the document.

To insert a word at the beginning of the document (before "sleepy") Tama [1] would create a new position list that starts with a random value between 0 and 10.

"[6, 1, 1] VERY, [10, 0, 1] SLEEPY, [11, 0, 2] CAT"

Yui [2], unable to generate an integer between 10 and 11, chooses a new random positioning number and appends her values to the preceding list. By growing the list, Yui can always find a position between two adjacent words.

"[6, 1, 1] VERY, [10, 0, 1] SLEEPY, [10, 0, 1, 130, 2, 1] GRUMPY, [11, 0, 2] CAT"

You can insert these four words - ordered by their position lists - in any sequence. You'll always arrive at the same result.

The example above uses a position list per word, but you'll probably want a position list per character - occupying significantly more space than the text alone in exchange for predictable performance.

Praise for the performance of CRDTs often rests on the worst-case scenario: a long conflicting history of edits. In practice, Operational Transformation can perform millions of simple transforms a second and grants greater freedom to optimise the data structure - so you might find OT performs better where fast feedback and short conflicting histories are the norm.

Personally, I enjoy using a modified version of LOGOOT. In the end, what convinced me was, not performance, but greater confidence that my code is correct.

Further reading

It seems incredible that, after 30 years of research, many products still restrict work to a single author or just lose overlapping edits. If, instead, we apply techniques to detect, avoid, and resolve conflicts, we can write good collaborative software.