For the sample application and source code download, please see the main article: Using AvalonEdit (WPF Text Editor)

Sample Image - maximum width is 600 pixels

Introduction

Using the Code

Document (The Text Model)

So, what is the model of a text editor that has support for complex features like syntax highlighting and folding?
Would you expect to be able to access collapsed text using the document model, given that the text is folded away?
Is the syntax highlighting part of the model?

In my quest for a good representation of the model, I decided on a radical strategy: if it's not a char, it's not in the model!

The main class of the model is ICSharpCode.AvalonEdit.Document.TextDocument. Basically, the document is a StringBuilder with events. However, the Document namespace also contains several features that are useful to applications working with the text editor.

In the text editor, all three controls (TextEditor, TextArea, TextView) have a Document property pointing to the TextDocument instance. You can change the Document property to bind the editor to another document; but please only do so on the outermost control (usually TextEditor), it will inform its child controls about that change. Changing the document only on a child control would leave the outer controls confused.

It is possible to bind two editor instances to the same document; you can use this feature to create a split view.

Simplified definition of TextDocument:

public sealed class TextDocument : ITextSource
{
    public event EventHandler UpdateStarted;
    public event EventHandler<DocumentChangeEventArgs> Changing;
    public event EventHandler<DocumentChangeEventArgs> Changed;
    public event EventHandler TextChanged;
    public event EventHandler UpdateFinished;

    public TextAnchor CreateAnchor(int offset);
    public ITextSource CreateSnapshot();

    public IList<DocumentLine> Lines { get; }
    public DocumentLine GetLineByNumber(int number);
    public DocumentLine GetLineByOffset(int offset);
    public TextLocation GetLocation(int offset);
    public int GetOffset(int line, int column);

    public char GetCharAt(int offset);
    public string GetText(int offset, int length);

    public void BeginUpdate();
    public bool IsInUpdate { get; }
    public void EndUpdate();

    public void Insert(int offset, string text);
    public void Remove(int offset, int length);
    public void Replace(int offset, int length, string text);

    public string Text { get; set; }
    public int LineCount { get; }
    public int TextLength { get; }
    public UndoStack UndoStack { get; }
}

Offsets

In AvalonEdit, an index into the document is called an offset.

Offsets usually represent the position between two characters. The first offset at the start of the document is 0; the offset after the first char in the document is 1. The last valid offset is document.TextLength, representing the end of the document.

This is exactly the same as the 'index' parameter used by methods in the .NET String or StringBuilder classes. Offsets are used because they are dead simple. To all text between offset 10 and offset 30, simply call document.GetText(10, 20) – just like String.Substring, AvalonEdit usually uses Offset / Length pairs to refer to text segments.

To easily pass such segments around, AvalonEdit defines the ISegment interface:

public interface ISegment
{
    int Offset { get; }
    int Length { get; } // must be non-negative
    int EndOffset { get; } // must return Offset+Length
}
All TextDocument methods taking Offset/Length parameters also have an overload taking an ISegment instance – I have just removed those from the code listing above to make it easier to read.

Lines

Offsets are easy to use, but sometimes you need Line / Column pairs instead. AvalonEdit defines a struct called TextLocation for those.

The document provides the methods GetLocation and GetOffset to convert between offsets and TextLocations. Those are convenience methods built on top of the DocumentLine class.

The TextDocument.Lines collection contains one DocumentLine instance for every line in the document. This collection is read-only to user code and is automatically updated to always* reflect the current document content.

Internally, the DocumentLine instances are arranged in a binary tree that allows for both efficient updates and lookup. Converting between offset and line number is possible in O(lg N) time, and the data structure also updates all offsets in O(lg N) whenever text is inserted/removed.

* tiny exception: it is possible to see the line collection in an inconsistent state inside ILineTracker callbacks. Don't use ILineTracker unless you know what you are doing!

Change Events

Here is the order in which events are raised during a document update:

BeginUpdate()

Insert() / Remove() / Replace()

EndUpdate()

If the insert/remove/replace methods are called without a call to BeginUpdate(), they will call BeginUpdate() and EndUpdate() to ensure no change happens outside of UpdateStarted/UpdateFinished.

There can be multiple document changes between the BeginUpdate() and EndUpdate() calls. In this case, the events associated with EndUpdate will be raised only once after the whole document update is done.

The UndoStack listens to the UpdateStarted and UpdateFinished events to group all changes into a single undo step.

TextAnchor

If you are working with the text editor, you will likely run into the problem that you need to store an offset, but want it to adjust automatically whenever text is inserted prior to that offset.

Sure, you could listen to the TextDocument.Changed event and call GetNewOffset on the DocumentChangeEventArgs to translate the offset, but that gets tedious; especially when your object is short-lived and you have to deal with deregistering the event handler at the correct point of time.

A much simpler solution is to use the TextAnchor class. Usage:

TextAnchor anchor = document.CreateAnchor(offset);
ChangeMyDocument();
int newOffset = anchor.Offset;

The document will automatically update all text anchors; and because it uses weak references to do so, the GC can simply collect the anchor object when you don't need it anymore.

Moreover, the document is able to efficiently update a large number of anchors without having to look at each anchor object individually. Updating the offsets of all anchors usually only takes time logarithmic to the number of anchors. Retrieving the TextAnchor.Offset property also runs in O(lg N).

When a piece of text containing an anchor is removed; that anchor will be deleted. First, the TextAnchor.IsDeleted property is set to true on all deleted anchors, then the TextAnchor.Deleted events are raised. You cannot retrieve the offset from an anchor that has been deleted.

This deletion behavior might be useful when using anchors for building a bookmark feature, but in other cases you want to still be able to use the anchor. For those cases, set TextAnchor.SurviveDeletion = true.

Note that anchor movement is ambiguous if text is inserted exactly at the anchor's location. Does the anchor stay before the inserted text, or does it move after it? The property TextAnchor.MovementType will be used to determine which of these two options the anchor will choose. The default value is AnchorMovementType.BeforeInsertion.

If you want to track a segment, you can use the AnchorSegment class which implements ISegment using two text anchors.

TextSegmentCollection

Sometimes it is useful to store a list of segments and be able to efficiently find all segments overlapping with some other segment.
Example: you might want to store a large number of compiler warnings and render squiggly underlines only for those that are in the visible region of the document.

The TextSegmentCollection serves this purpose. Connected to a document, it will automatically update the offsets of all TextSegment instances inside the collection; but it also has the useful methods FindOverlappingSegments and FindFirstSegmentWithStartAfter. The underlying data structure is a hybrid between the one used for text anchors and an interval tree, so it is able to do both jobs quite fast.

Thread Safety

The TextDocument class is not thread-safe. It expects to have a single owner thread and will throw an InvalidOperationException when accessed from another thread.

However, there is a single method that is thread-safe: CreateSnapshot()
It returns an immutable snapshot of the document, and may be safely called even when the owner thread is concurrently modifying the document. This is very useful for features like a background parser that is running on its own thread. The overload CreateSnapshot(out ChangeTrackingCheckpoint) also returns a ChangeTrackingCheckpoint for the document snapshot. Once you have two checkpoints, you can call GetChangesTo to retrieve the complete list of document changes that happened between those versions of the document.

Points of Interest

Did you learn anything interesting/fun/annoying while writing the code? Did you do anything particularly clever or wild or zany?

History

Keep a running update of any changes or improvements you've made here.

Note: although my sample code is provided under the MIT license, ICSharpCode.AvalonEdit itself is provided under the terms of the GNU LGPL.