Hollow provides the mechanism for maintaining the state of data, serializing data, reading and accessing data, and a comprehensive set of tooling for manipulating and investigating data.

Hollow does not provide or specify the infrastructure for actually disseminating the produced blobs from the producer to the consumers. This section will describe the infrastructure needed for a usual production deployment of Hollow. It’s entirely possible to wield the Hollow framework in ways which differ from the usage described here.

The Producer Cycle

Generally, a producer runs a repeating cycle. During each cycle, the producer goes through two distinct phases:

  1. Adding all of the data for a state to a HollowWriteStateEngine
  2. Writing blobs from a HollowWriteStateEngine

To transition back and forth between these phases, two method calls on the HollowWriteStateEngine are used. To transition to the first phase, prepareForNextCycle() must be called. To transition to the second phase, prepareForWrite() must be called.

Each new generated state is assigned a unique, monotonically increasing 64-bit identifier. State identifiers impose an ordering over states. Later states have greater identifiers than earlier states. The identifier is used to both identify the state and index the blobs in the published blob store.

State Identifiers

Tip: it is useful to derive these identifiers based on the current timestamp.

At the end of each cycle, the producer publishes up to three types of blobs for the resulting state -- a snapshot, a delta, and a reverse delta.

Reverse Deltas

Just as a delta transitions between an earlier state and an adjacent later state, a reverse delta transitions between a later state and an earlier adjacent state. A reverse delta is created by simply calling writeReverseDelta(OutputStream) on a HollowBlobWriter.

Storing the Blobs

Blobs are published to a file store which is accessible by consumers. From this blob store, consumers must be able to query for and retrieve blobs in the following ways:

  • Snapshots: Must be queryable based on the state identifier. If a blob store is queried for a snapshot with an identifier which does not exist, the snapshot with the greatest identifier prior to the queried identifier should be retrieved.
  • Deltas: Must be queryable based on the state identifier to which a delta should be applied.
  • Reverse Deltas: Must be queryable based on the state identifier to which a reverse delta should be applied.

Announcing the State

Once the necessary transitions to bring clients up to date have been written to the blob store, the availability of the state must be announced to clients. This simply means that a centralized location must be maintained and updated by the producer which indicates the version of the currently available state.

When this announced state is updated, usually it is desirable to have consumers realize this update as quickly as possible. This can be accomplished either via a push notification to all consumers, or via frequent polling by consumers.

Restoring At Startup

The examples of writing blobs thus far have assumed that the same HollowWriteStateEngine is held in memory for the duration of a dataset’s delta chain. However, this isn’t always possible; the producer will need to be restarted from time to time due to deployment or other operational circumstances.

In order to produce a delta between states produced by one HollowWriteStateEngine and another, the producer can restore the prior state upon restart, which will allow a delta and reverse delta to be produced:

HollowReadStateEngine readEngine = /// the most recent state

HollowWriteStateEngine writeEngine = new HollowWriteStateEngine();


/// initialize the data model
HollowObjectMapper mapper = new HollowObjectMapper(writeEngine);
mapper.initialize(Movie.class);

writeEngine.restoreFrom(readEngine);

Once we have restored the prior state, we can produce a delta from our producer's first cycle. The delta will be applicable to any consumers which are on the state from which we restored.

Note that prior to restore, a HollowWriteStateEngine must be initialized with the schemas for its data model. This is because the restore operation does not require that schemas exactly match between the restored state and the new state; it is legal to add or remove types and fields.

When a delta is produced from a restored state after schemas have been updated, records for which all of the common fields between both schemas are unchanged will be assigned to the same ordinal and will be considered unmodified. If multiple such record pairs exist, the records to be considered unmodified will be selected arbitrarily among those pairs.

Initializing Before Restore

A HollowWriteStateEngine's data model may be initialized:

  • via the HollowObjectMapper by calling initTypeState() with all top-level classes
  • via a set of schemas loaded from a text file using the HollowSchemaParser and HollowWriteStateCreator

Rolling Back

While producing a new state, it is possible to roll back a HollowWriteStateEngine to the previous time prepareForNextCycle() was called:

public void runTheCycle(HollowWriteStateEngine writeEngine) {
    writeEngine.prepareForNextCycle();
    try {
        addAllData(writeEngine);
    } catch(Throwable unexpected) {
        writeEngine.resetToLastPrepareForNextCycle();
    }
}

When this method is called, it’s as if none of the additions/removals since the last call to prepareForNextCycle() ever happened. This action is available right up until the next call to prepareForNextCycle() is called.

It’s best practice to wrap the code which adds data to a state engine with a try/catch block shown above. This will cover any scenario in which a producer runs into an unexpected Exception due to an unforeseen bug in the code.

Validating Data

It likely makes sense to perform some basic validation on your produced data states before announcing them to clients. This usually takes the form of loading the data into a HollowReadStateEngine on the producer, then gathering and checking some heuristics-based metrics on the data before announcement. These validation rules will be specific to the semantics of the dataset. If a problem is detected, send an alert and roll back the write engine, rather than announcing. This way a delta may be produced from the previous good state.

Consumer Framework

Data consumers keep their local copy of a dataset current by ensuring that their state engine is always at the latest announced data state. Consumers can arrive at a particular data state in a couple of different ways:

  • At initialization time, they will load a snapshot, which is an entire copy of the dataset to be forklifted into memory.
  • After initialization time, they will keep their local copy of the dataset current by applying delta transitions, which are the differences between adjacent data states.

The HollowClient encapsulates the details of initializing and keeping a dataset up to date. In order to accomplish this task, a few infrastructure hooks must be injected:

public HollowClient(HollowBlobRetriever blobRetriever,
                    HollowAnnouncementWatcher announcementWatcher,
                    HollowUpdateListener updateListener,
                    HollowAPIFactory apiFactory,
                    HollowObjectHashCodeFinder hashCodeFinder,
                    HollowClientMemoryConfig memoryConfig)

Let's examine each the injected hooks to the HollowClient:

  • HollowBlobRetriever: The interface to the blob store. This is the only hook for which a custom implementation is required. Each of the other hooks have default implementations which may be used.
  • HollowAnnouncementWatcher: Provides an interface to the state announcement mechanism. Often, announcement polling logic is encapsulated inside implementations.
  • HollowUpdateListener: Provides hooks so that actions may be taken during and after updates (e.g. indexing).
  • HollowAPIFactory: Allows users to specify a custom-generated Hollow API to use.
  • HollowClientMemoryConfig: Defines advanced settings related to object longevity and double snapshots.

Each time the identifier of the currently announced state changes, triggerRefresh() should be called on the HollowClient. This will bring the data up to date.

Pinning Consumers

Mistakes happen. What's important is that we can recover from them quickly. If you accidentally publish bad data, you should be able to revert those changes quickly. If you give your HollowAnnouncementWatcher implementation an alternate location to read the announcement from, which overrides the announcement from the consumer, then you can use this to quickly force clients to go back to any arbitrary state in the past. We call setting a state version in this alternate location pinning the consumers.

Implementing a pinning mechanism is extremely useful and highly recommended. You can operationally reverse data issues immediately upon discovery, so that symptoms go away while you diagnose exactly what went wrong. This can save an enormous amount of stress and money.

Unpinning

If you've pinned consumers due to a data issue, it's probably not desirable to simply 'unpin' them after the root cause is addressed. Instead, restart the producer and instruct it to restore from the pinned state. It should then produce a delta which skips over all of the bad states. Only unpin after the delta from the pinned version to a bad version is overwritten with a delta from the pinned version to the good version.

Blob Namespaces

Every so often, it may be required to make changes to the data model which is incompatible with prior versions. In this case, an older producer, which produces the older data model, should run in parallel with the newer producer, producing the newer, incompatible data model.

Incompatible Data Model Changes

For details about changes which are and are not backwards compatible, see Maintaining Backwards Compatibility

Each producer should write its blobs to a different namespace, so that older consumers can read from the old data model, and newer consumers can read from the newer data model. This will result in parallel delta chains created in these separate namespaces. Once all consumers are upgraded and reading from the newer data model, the older producer can be shut down.

The method of namespacing will vary with the chosen data persistence technology.