This section describes the usage of some of the tooling which ships with Hollow, but the tools described here are by no means a comprehensive accounting of the things you can do with your data once it's Hollow. We hope that you'll find it straightforward to use the basic building blocks provided by the Hollow framework in different ways to create new tooling — and then contribute back any tools which you develop for your use case and find useful.

Insight Tools

History tool

Hollow provides the ability to retain, in memory, the changes in a dataset over many states, and to easily access historical data. This is accomplished via the HollowHistory class:

public HollowHistory(HollowReadStateEngine initialHollowStateEngine, 
                     long initialVersion, 
                     int maxHistoricalStatesToKeep)

State Versioning

The initialVersion parameter above should be a unique value identifying the state.

The HollowHistory should be configured with the primary keys of records for which we are interested in tracking history. For example, using our Movie/Actor example from the Getting Started guide, we may specify the following configuration:

HollowHistory history = new HollowHistory(readEngine, 1, 1000);
HollowHistoryKeyIndex historyIdx = history.getKeyIndex();

historyIdx.addTypeIndex("Movie", "id");
historyIdx.indexTypeField("Movie", "id");

historyIdx.addTypeIndex("Actor", "actorId");

Notice there are two types of calls available to the HollowHistoryKeyIndex:

  • The addTypeIndex() call specifies the primary key for a type which we want to be able to view historical changes over time. Primary keys may be defined over multiple fields. The final parameter in the addTypeIndex() call is a vararg.
  • The indexTypeField() call specifies an individual primary key field over which we want to be able to search for historical changes over time.

Primary Keys

The HollowHistory will, by default, automatically configure any primary keys which are defined in the Object schemas of your dataset. However, the calls to indexTypeField() will not be automatically configured.

Once instantiated and configured, the HollowHistory should be notified each time the state engine is transitioned via the deltaOccurred(long newVersion) method. The HollowHistory will track the entire dataset for each state which through which the state engine is transitioned.

This historical data is maintained by retaining and indexing all of the changes for the delta chain in memory. Because only changes over time are retained, rather than complete states, a great length of history can often be held in memory.

Hollow includes a ready-made UI which can be applied to a HollowHistory for any dataset. The included UI clearly displays the changes which occur between adjacent states as the state engine transitions through a delta chain. This will allow users to quickly realize all of the benefits of indexed, historical data retention at their fingertips.

The HollowHistoryUI class in the hollow-diff-ui project is instantiated using a HollowHistory and a base URL path. Incoming requests should be sent to the handle method:

public boolean handle(String target, 
                      HttpServletRequest req, 
                      HttpServletResponse resp) throws IOException

The HollowHistoryUI can be used in the context of an existing web container, or can be invoked via the included HollowHistoryUIServer, which uses the Jetty HTTP Servlet Server:

HollowHistory history = /// set up the history;

HollowHistoryUI ui = new HollowHistoryUI("", history);
HollowHistoryUIServer server = new HollowHistoryUIServer(ui, 8080);

server.start();
server.join();

While the above code is running, you can point a browser to http://localhost:8080 to explore the history.

Right out of the box, the history tool provides the ability to get a bird’s eye view of all of the changes a dataset goes through over time, while simultaneously allowing for specific queries to see exactly how individual records change as the dataset transitions between states. The history tool has proven to be enormously beneficial when investigating data issues in production scenarios. When something looks incorrect, it’s easy to pinpoint exactly what changed when, which can vastly expedite data corrections and eliminate hours of potential detective work.

Diff Tool

Just as the Hollow history tool UI makes the differences between any two adjacent states in a delta chain readily accessible, the Hollow diff tool is used to investigate the differences between any two arbitrary data states, even those which may exist in different delta chains.

This is especially useful as a step in a regular release cadence, as the differences between data states produced, for example, in a test environment and production environment can be evaluated at a glance. Sometimes, unintended consequences of code updates may be discovered this way, which prevents production issues before they happen.

Initiating a diff between two data states is accomplished by loading both states into separate HollowReadStateEngines in memory, and then instantiating a HollowDiff and configuring it with the primary keys of types to diff. For our Movie/Actor example:

HollowReadStateEngine testData = /// load test data
HollowReadStateEngine prodData = /// load test data

HollowDiff diff = new HollowDiff(testData, prodData);
diff.addTypeDiff("Movie", "id");
diff.addTypeDiff("Actor", "actorId");

diff.calculateDiffs();

A diff is calculated by matching records of the same type based on defined primary keys. The unmatched records in both states are tracked, and detailed differences between field values in matching pairs are also tracked.

Primary Keys

The HollowDiff will, by default, automatically configure any primary keys which are defined in the Object schemas of your dataset.

Hollow includes a ready-made UI which can be applied to a HollowDiff. The HollowDiffUI class can be used in the context of an existing web container, or can be invoked via the HollowDiffUIServer, which uses the Jetty HTTP Servlet Server:

HollowDiff diff = /// build the diff

HollowDiffUIServer server = new HollowDiffUIServer(8080);
server.start();

server.addDiff("diff", diff);

server.join();

While the above code is running, you can point a browser to http://localhost:8080 to explore the diff.

Heap Usage Analysis

One of the most important considerations when dealing with in-memory datasets is the heap utilization of that dataset on consumer machines. Hollow provides a number of methods to analyze this metric.

Given a loaded HollowReadStateEngine, it is possible to iterate over each type and gather statistics about its approximate heap usage. This is done in the following example:

HollowReadStateEngine stateEngine = /// a populated state engine

long totalApproximateHeapFootprint = 0;

for(HollowTypeReadState typeState : stateEngine.getTypeStates()) {
    String typeName = typeState.getSchema().getName();
    long heapCost = typeState.getApproximateHeapFootprintInBytes();
    System.out.println(typeName + ": " + heapCost);
    totalApproximateHeapFootprint += heapCost;
}

System.out.println("TOTAL: " + totalApproximateHeapFootprint);

As shown above, information can be gathered about the total heap footprint, and also about the heap footprint of individual types. This information can be helpful in identifying optimization targets. This technique can also be used to identify how the heap cost of individual types changes over time, which can provide early warning signs about optimizations which should be targeted proactively.

Usage Tracking

Hollow tracks usage, which can be investigated at runtime. By default, this functionality is turned off, but it can be enabled by injecting a HollowSamplingDirector into a Hollow API in a running instance. You can use the TimeSliceSamplingDirector implementation, which will by default record every access which happens during 1ms out of every second:

MovieAPI api = /// a custom-generated API

TimeSliceSamplingDirector samplingDirector = new TimeSliceSamplingDirector();
samplingDirector.startSampling();

api.setSamplingDirector(samplingDirector);

Once this is enabled, and some time has passed for samples to be gathered, the results can be collected for analysis:

for(SampleResult result : api.getAccessSampleResults()) {
    if(result.getNumSamples() > 0)
        System.out.println(result.getIdentifier() + ": " + 
                                                  result.getNumSamples());
}

Transitive Set Traverser

The TransitiveSetTraverser can be used to find children and parent references for a selected set of records. We start with an initial set of selected records by ordinal, represented with a Map<String, BitSet>. Entries in this map will indicate a type, plus the ordinals of the selected records:

Map<String, BitSet> selection = new HashMap<String, BitSet>();

/// select the movies with IDs 1 and 6.
BitSet selectedMovies = new BitSet();
selectedMovies.set(movieIdx.getMatchingOrdinal(1));
selectedMovies.set(movieIdx.getMatchingOrdinal(6));

selection.put("Movie", movies);

We can add the references, and the transitive references of our selection. After the following call returns, our selection will be augmented with these matches:

TransitiveSetTraverser.addTransitiveMatches(readEngine, selection);

Transitive References

If A references B, and B references C, then A transitively references C

Given a selection, we can also add any records which reference anything in the selection. This is essentially the opposite operation as above; it can be said that addTransitiveMatches traverses down, while addReferencingOutsideClosure traverses up. After the following call returns, our selection will be augmented with this selection:

TransitiveSetTraverser.removedReferencedOutsideClosure(readEngine, selection);

Dataset Manipulation Tools

Filtering

Sometimes, a dataset will be of interest to multiple different types of consumers, but not all consumers may be interested in all aspects of a dataset. In these cases, it’s possible to omit certain types and fields from a client’s view of the data. This is typically done to tailor a consumer’s heap footprint and startup time costs based on their data needs.

Using our Movie/Actor example above, if there was a consumer which was interested in Movie records, but not Actor records, that consumer might construct a consumer-side data filter configuration in the following way:

HollowFilterConfig config = new HollowFilterConfig(true);
config.addField("Movie", "actors");
config.addType("ListOfActor");
config.addType("Actor");

The boolean true parameter in the constructor above indicates that this is an exclusion filter. We could accomplish the same goal using an inclusion filter:

HollowFilterConfig config = new HollowFilterConfig(false);
config.addField("Movie", "id");
config.addField("Movie", "title");
config.addField("Movie", "releaseYear");
config.addType("String");

The difference between these two configurations is how the filter behaves as new types and fields are added to the data model. The exclusion filter will not exclude them by default, whereas the inclusion filter will.

A filter configuration is applied to a state engine at read time:

HollowBlobReader reader = /// a blob reader
InputStream stream = /// a stream of the snapshot
HollowFilterConfig config = /// the filter configuration

reader.readSnapshot(inputStream, filter);

Combining

The HollowCombiner is used to copy data from one or more copies of hollow datasets in HollowReadStateEngines into a single HollowWriteStateEngine. If each of the inputs contain the same data model, the following is sufficient to combine them:

HollowReadStateEngine input1 = /// an input
HollowReadStateEngine input2 = /// another input

HollowCombiner combiner = new HollowCombiner(input1, input2);
combiner.combine();

HollowWriteStateEngine combined = combiner.getCombinedStateEngine();

By default, the combiner will copy all records from all types from the inputs to the output. We can direct the combiner to exclude certain records from copying using a HollowCombinerCopyDirector. The interface for a HollowCombinerCopyDirector allows for making decisions about copying individual records during a combine operation by implementing the following method:

public boolean shouldCopy(HollowTypeReadState typeState, int ordinal);

If this method returns false, then the copier will not attempt to directly copy the matching record. However, if the matching record is referenced via another record for which this method returns true, then it will still be copied regardless of the return value of this method.

The most broadly useful provided implementation of the HollowCombinerCopyDirector is the HollowCombinerExcludePrimaryKeysCopyDirector, which can be used to specify record exclusions by primary key. For example, if we wanted to create a copy of a state engine with the Movie records with ids 100 and 125 excluded:

HollowReadStateEngine input = /// an input
HollowPrimaryKeyIndex idx = new HollowPrimaryKeyIndex(input, "Movie", "id");

HollowCombinerExcludePrimaryKeysCopyDirector director = 
                          new HollowCombinerExcludePrimaryKeysCopyDirector();

director.excludeKey(idx, 100);
director.excludeKey(idx, 125);

HollowCombiner combiner = new HollowCombiner(director, input);
combiner.combine();

HollowWriteStateEngine result = combiner.getCombineStateEngine();

It’s possible that while combining two inputs, both may have a record of the same type with the same primary key. This violation of the uniqueness constraint of a primary key can be avoided by informing the combiner of the primary keys in a data model prior to the combine operation:

HollowCombiner combiner = new HollowCombiner(input1, input2);

combiner.setPrimaryKeys(
        new PrimaryKey("Movie", "id"),
        new PrimaryKey("Actor", "actorId")
);

combiner.combine();

If multiple records exist in the inputs matching a single value for any of the supplied primary keys, then only one such record will be copied to the output. The specific record which is copied will be the record from the input was supplied earliest in the constructor of the HollowCombiner. Further, if any record references another record which was omitted because it would have been duplicate based on this rule, then that reference is remapped in the output state to the matching record which was chosen to be included.

Splitting

A single dataset can be sharded into multiple datasets using a HollowSplitter. The HollowSplitter takes a HollowSplitterCopyDirector, which indicates:

  • top level types to split,
  • the number of shards to create, and
  • which shard to send individual records.

Top Level Types

Top level types are those which are not referenced by any other types. In our Movie/Actor example, Movie is a top-level type, but Actor is not.

Two default implementations of HollowSplitterCopyDirector are available:

  • HollowSplitterOrdinalCopyDirector
  • HollowSplitterPrimaryKeyCopyDirector.

These directors will split top-level types among a specified number of shards either by ordinals or primary keys, respectively. When splitting by ordinal, a record with a specific primary key may jump between shards when it is modified, while with the primary key director a specific primary key will consistently hash to the same shard.

Our Movie/Actor example may use the splitter to split a dataset into four shards with the following invocation:

HollowReadStateEngine stateEngine = /// a state engine

HollowSplitterCopyDirector director = 
                            new HollowSplitterOrdinalCopyDirector(4, "Movie");

HollowSplitter splitter = new HollowSplitter(director, stateEngine);
splitter.split();


for(int i=0; i<4; i++) {
    HollowWriteStateEngine shard = splitter.getOutputShardStateEngine(i);
}

State Manipulation Tools

Patching

Using the HollowWriteStateEngine’s restore capability, it’s possible to produce deltas forever, so that consumers never have to load a snapshot after initialization. However, if environmental hiccups cause a producer to fail to publish a delta, or if a delta is lost, or if it’s desired to publish a delta between non-adjacent states, then the HollowStateDeltaPatcher may be used to produce deltas between two arbitrary states within the same delta chain.

The HollowStateDeltaPatcher must produce two delta transitions to create a transition between arbitrary states. This is because non-adjacent states may have different records occupying the same ordinals. Since no ordinal may be removed and added in adjacent states, the state patcher must create an intermediate state in which modified records do not share any ordinals.

See the HollowStateDeltaPatcher javadocs for usage details.

Compacting

It is possible to produce delta chains which extend over many thousands of states. If during this delta chain an especially large delta happens for a specific type, it’s possible that many ordinal holes will be present in that type. If over time multiple types go through especially large deltas, this can have an impact on a dataset’s heap footprint.

To reclaim heap space occupied by ordinal holes, the HollowCompactor may be used to move records off of the high end of the ordinal space into these holes. This is accomplished by producing deltas which only include removals and additions of identical records allocated to more optimal ordinals. See the HollowCompactor javadocs for usage details.

Indexing / Querying

Primary Keys

In our Movie/Actor example from the Getting Started guide, we saw that we can easily create a HollowPrimaryKeyIndex which will allow us to query for Movie records by id:

HollowPrimaryKeyIndex idx = 
                      new HollowPrimaryKeyIndex(readEngine, "Movie", "id");
idx.listenForDeltaUpdates();

In that example, the primary key was defined for Movie as its id field. A primary key can also be defined over multiple and/or hierarchical fields. Imagine that Movie additionally had a country field defined in its schema, and that across countries, Movie ids may be duplicated, but that there will never exist two Movie records with the same id and country:

public class Movie {
    long id;
    Country country;
    String title;
    int releaseYear;
}


public class Country {
    String id;
    String name;
}

A HollowPrimaryKeyIndex can be defined with a primary key consisting of both fields:

HollowPrimaryKeyIndex idx = 
            new HollowPrimaryKeyIndex(readEngine, "Movie", "id", "country.id.value");
idx.listenForDeltaUpdates();

And to query for a Movie based on its id and country:

int movieOrdinal = idx.getMatchingOrdinal(2, "US");
if(movieOrdinal != -1) {
    MovieHollow movie = movieApi.getMovieHollow(movieOrdinal);
    System.out.println("Found Movie: " + movie._getTitle()._getValue());
}

Notice that Movie’s country field in the above example is actually a REFERENCE field. The defined key includes the id of the movie, and the value of the id String of the referenced country. We denote this traversal using dot notation in the primary key definition. The field definitions can be multiple references deep.

The requirement for a primary key definition is that no duplicates should exist for the defined combination of fields. If this rule is violated, an arbitrary match will be returned for queries when multiple matches exist.

Primary Key Violations

Violations of the "no duplicate" primary key rule can be detected using the getDuplicateKeys() method on a HollowPrimaryKeyIndex, which returns a Collection<Object[]>. If no duplicate keys exist, the returned Collection will be empty. If they do, the returned values will indicate the keys for which duplicate records exist.

If a HollowPrimaryKeyIndex will be retained for a long duration, they should be kept updated as deltas are applied to the underlying HollowReadStateEngine. This is accomplished with a single call after instantiation to the listenForDeltaUpdates() method.

Detaching Primary Key Indexes

If listenForDeltaUpdates() is called on a primary key index, then it cannot be garbage collected. If you intend to drop an index which is listening for updates, first call detachFromDeltaUpdates() to prevent a memory leak.

Indexes which are listening for delta updates are updated after a dataset is updated. In the brief interim time between when a dataset is updated and the index is updated, the index will point to the ghost records located at tombstoned ordinals. This helps guarantee that all in-flight operations will observe correct data.

Hash Indexes

It is sometimes desirable to index records by fields other than primary keys. The HollowHashIndex allows for indexing records by fields or combinations of fields for which values may match multiple records, and records may match multiple values.

In our Movie/Actor example, we may want to index movies by their starring actors:

HollowHashIndex idx = 
            new HollowHashIndex(readEngine, "Movie", "", "actors.element.id");

The HollowHashIndex expects in its constructor arguments a query start type, a select field, and a set of match fields. The constructor arguments above indicate that queries will start with the Movie type, select the root of the query (indicated by the empty string), and match the id of any Actor record in the actors list.

To query this index:

HollowHashIndexResult result = idx.findMatches(102);

if(result != null) {
    System.out.println("Found matches: " + result.numResults());

    HollowOrdinalIterator iter = result.iterator();
    int matchedOrdinal = iter.next();
    while(matchedOrdinal != HollowOrdinalIterator.NO_MORE_ORDINALS) {
        MovieHollow movie = api.getMovieHollow(matchedOrdinal);
        System.out.println("Starred in: " + movie._getTitle()._getValue());
        matchedOrdinal = iter.next();
    }
}

Alternatively, if the data model included the nationality of Actors, and we needed to index Actors by nationality and the titles of movies in which they starred:

HollowHashIndex idx = 
            new HollowHashIndex(readEngine, "Movie", "actors.element",
                                            "title.value",
                                            "actors.element.nationality.id");

In this case, the query start type is still Movie, but we’re selecting related Actors. Matches are selected based on the Movie’s title, and the actor’s nationality. Using this index, one can query for Brazilian actors who starred in movies titled “Narcos”:

HollowHashIndexResult result = idx.findMatches("Narcos", "BR");

if(result != null) {
    HollowOrdinalIterator iter = result.iterator();
    int matchedOrdinal = iter.next();
    while(matchedOrdinal != HollowOrdinalIterator.NO_MORE_ORDINALS) {
        ActorHollow actor = api.getMovieHollow(matchedOrdinal);
        System.out.println("Matched actor: " + 
                                      actor._getActorName()._getValue());
        matchedOrdinal = iter.next();
    }
}

The HollowHashIndex does not yet have a facility for listening for delta updates. If an index is necessary across multiple states, currently the index must be recreated on each update.