AVRO is an Apache open source project for data serialization
and data exchange services for Hadoop .Using Avro (which functions similar to
systems such as Apache Thrift, Protocol Buffers- Google's) data
can be exchanged between programs written in any language. Avro is gaining new users
compared to other popular serialization frameworks, for the reason that many
Hadoop based tools support Avro for serialization and De-serialization.
Before we get into the features lets understand about Serialization & De-serialization.
Serialization means
turning the structured objects into a bytes stream for transmission over the
network or for writing to persistent storage.
De-serialization is the
opposite of serialization, where we read the bytes stream or stored persistent
storage and turns them into structured objects.
The serialized data which is in a binary format is
accompanied with schemas allowing any application to de serialize the data.
Some Features Of Avro
- Avro serialized data doesn't require proxy objects or code generation (unless desired for statically-typed languages). Avro uses definitions at runtime during data exchange. It always stores data structure definitions with the data making it easier to process rather than going for code generation.