Data Ingestion¶
Hollow includes a few ready-made data ingestion mechanisms. Additionally, custom data ingestion mechanisms can be created relatively easily using the Low Level Input API.
HollowObjectMapper¶
When using a HollowProducer
, each call to state.add(obj)
is delegated to a HollowObjectMapper
. The HollowObjectMapper
is used to add POJOs into a HollowWriteStateEngine
:
HollowWriteStateEngine engine = /// a state engine
HollowObjectMapper mapper = new HollowObjectMapper(writeEngine);
for(Movie movie : movies)
writeEngine.add(movie);
The HollowObjectMapper
can also be used to initialize the data model of a HollowWriteStateEngine
without adding any actual data:
HollowWriteStateEngine engine = /// a state engine
HollowObjectMapper mapper = new HollowObjectMapper(writeEngine);
mapper.initializeTypeState(Movie.class);
mapper.initializeTypeState(Show.class);
Schemas will be assumed based on the field and type names in the POJOs. Any referenced types will also be traversed and included in the derived data model.
Thread Safety
The HollowObjectMapper
is thread-safe; multiple threads may add Objects at the same time.
Specifying type names in the HollowObjectMapper¶
By default, type names are equal to the names of classes added to the HollowObjectMapper
. Alternatively, the name of a type may be explicitly defined by using the @HollowTypeName
annotation. This annotation can be added at either the class or field level.
The following example Award
class will reference a type AwardName
, which will contain a single string value:
The following example Category
class will be added as the type Genre
, where not otherwise specified by referencing fields:
Namespaced fields
Using the @HollowTypeName
attribute is a convenient way to add appropriate record type namespacing into your data model.
Inlining fields in the HollowObjectMapper¶
You can inline fields in the HollowObjectMapper
by annotating them with @HollowInline
. The following example Creator
class inlines the field creatorName
:
The following java.lang.*
types can be inlined:
String
Boolean
Integer
Long
Double
Float
Short
Byte
Character
Memoizing POJOs in the HollowObjectMapper¶
If a long field named __assigned_ordinal
is defined in a POJO class, then HollowObjectMapper
will use this field to record the assigned ordinal when Objects of this class are added to the state engine.
When the HollowObjectMapper
sees this POJO again, it will short-circuit writing to the state engine and discovering or assigning an ordinal -- it will instead return the previously recorded ordinal. If during processing you can reuse duplicate referenced POJOs, then you can use this effect to greatly speed up adding records to the state engine.
If the __assigned_ordinal
field is present, it should be initialized to HollowConstants.ORDINAL_NONE
. The field may be (but does not have to be) private and/or final.
The following example Director
class uses the __assigned_ordinal
optimization:
public class Director {
long id;
String directorName;
private transient final long __assigned_ordinal = HollowConstants.ORDINAL_NONE;
}
Warning
If the __assigned_ordinal
optimization is used, POJOs should not be modified after they are added to the state engine. Any modifications after the first time a memoized POJO is added to the state engine will be ignored and any references to these POJOs will always point to the originally added record. Using immutable POJOs can help reduce errors.
JSON to Hollow¶
The project hollow-jsonadapter contains a component which will automatically parse json into a HollowWriteStateEngine
. The expected format of the json will be defined by the schemas in the HollowWriteStateEngine
. The data model must be pre-initialized. See the Schema Parser topic in this document for an easy way to configure the schemas with a text document.
The HollowJsonAdapter
class is used to populate records of a single type from a json file. A single record:
{
"id": 1,
"releaseYear": 1999,
"actors": [
{
"id": 101,
"actorName": "Keanu Reeves"
},
{
"id": 102,
"actorName": "Laurence Fishburne"
}
]
}
Can be parsed with the following code:
String json = /// the record above
HollowJsonAdapter jsonAdapter = new HollowJsonAdapter(writeEngine, "Movie");
jsonAdapter.processRecord(json);
If a field defined in the schema is not encountered in the json data, the value will be null in the corresponding Hollow record. If a field is encountered in the json data which is not defined in the schema, the field will be ignored.
A large number of records in a single file can also be processed:
Reader reader = /// a reader for the json file
HollowJsonAdapter jsonAdapter = new HollowJsonAdapter(writeEngine, "Movie");
jsonAdapter.populate(reader);
When processing an entire file, it is expected that the file contains only a single json array of records of the expected type. The records will be processed in parallel.
Hollow to JSON
Hollow objects can be converted to JSON string using HollowRecordJsonStringifier
. Tools backed by Hollow data is one of the cases where this can be useful.
Zeno to Hollow¶
The project hollow-zenoadapter has an adapter which can be used with Hollow’s predecessor, Zeno. We used this as part of our migration path from Zeno to Hollow, and it is provided for current users of Zeno who would like to migrate to Hollow as well. Start with the HollowStateEngineCreator.