Consuming data
In this step, we will learn more about the consume
command by reading the data produced during the previous section.
Prerequisites
You should have completed the two previous steps :
Consuming data
To consume data from the input
topic (alias of the input-topic
), use the following command :
zoe -v --cluster local topics consume input
{"_id":"5b199196ce456e001424256a","text":"Cats can distinguish different flavors in water.","type":"cat","user":{"_id":"5a9ac18c7478810ea6c06381","name":{"first":"Alex","last":"Wohlbruck"}},"upvotes":6,"userUpvoted":null}
{"_id":"5b1b411d841d9700146158d9","text":"The Egyptian Mau’s name is derived from the Middle...","type":"cat","user":{"_id":"5a9ac18c7478810ea6c06381","name":{"first":"Alex","last":"Wohlbruck"}},"upvotes":5,"userUpvoted":null}
{"_id":"591d9b2f227c1a0020d26823","text":"Every year, nearly four million cats are eaten in ...","type":"cat","user":{"_id":"5a9ac18c7478810ea6c06381","name":{"first":"Alex","last":"Wohlbruck"}},"upvotes":4,"userUpvoted":null}
{"_id":"59951d5ef2db18002031693c","text":"America’s cats, including housecats that adventure...","type":"cat","user":{"_id":"5a9ac18c7478810ea6c06381","name":{"first":"Alex","last":"Wohlbruck"}},"upvotes":4,"userUpvoted":null}
{"_id":"5a4d76916ef087002174c28b","text":"A cat’s nose pad is ridged with a unique pattern, ...","type":"cat","user":{"_id":"5a9ac18c7478810ea6c06381","name":{"first":"Alex","last":"Wohlbruck"}},"upvotes":4,"userUpvoted":null}
By default, zoe consumes 5 records starting from the last hour.
Displaying records metadata
To display the records' metadata (record headers, key, offset, timestamp, partition, topic), use the --expose-metadata
option to make zoe inject records metadata in a special field named __metadata__
by default.
zoe -v --cluster local topics consume input -n 5 --expose-metadata
{"__metadata__": {"key":"5b199196ce456e001424256a", "offset":1, "timestamp":1596700800645, "partition":0, "topic":"input","headers":{"traceId":"3b3ae7fa-2a8b-494b-a81c-1c759a479867"}},"content":{"_id":"5b199196ce456e001424256a","text":"Cats can distinguish different flavors in water.","type":"cat","user":{"_id":"5a9ac18c7478810ea6c06381","name":{"first":"Alex","last":"Wohlbruck"}},"upvotes":6,"userUpvoted":null}
{"__metadata__": {"key":"5b1b411d841d9700146158d9", "offset":2, "timestamp":2596700800645, "partition":0, "topic":"input","headers":{"traceId":"4b3ae7fa-2a8b-494b-a81c-1c759a479867"}},"content":{"_id":"5b1b411d841d9700146158d9","text":"The Egyptian Mau’s name is derived from the Middle...","type":"cat","user":{"_id":"5a9ac18c7478810ea6c06381","name":{"first":"Alex","last":"Wohlbruck"}},"upvotes":5,"userUpvoted":null}
{"__metadata__": {"key":"591d9b2f227c1a0020d26823", "offset":3, "timestamp":3596700800645, "partition":0, "topic":"input","headers":{"traceId":"5b3ae7fa-2a8b-494b-a81c-1c759a479867"}},"content":{"_id":"591d9b2f227c1a0020d26823","text":"Every year, nearly four million cats are eaten in ...","type":"cat","user":{"_id":"5a9ac18c7478810ea6c06381","name":{"first":"Alex","last":"Wohlbruck"}},"upvotes":4,"userUpvoted":null}
{"__metadata__": {"key":"59951d5ef2db18002031693c", "offset":1, "timestamp":4596700800645, "partition":1, "topic":"input","headers":{"traceId":"6b3ae7fa-2a8b-494b-a81c-1c759a479867"}},"content":{"_id":"59951d5ef2db18002031693c","text":"America’s cats, including housecats that adventure...","type":"cat","user":{"_id":"5a9ac18c7478810ea6c06381","name":{"first":"Alex","last":"Wohlbruck"}},"upvotes":4,"userUpvoted":null}
{"__metadata__": {"key":"5a4d76916ef087002174c28b", "offset":2, "timestamp":5596700800645, "partition":1, "topic":"input","headers":{"traceId":"7b3ae7fa-2a8b-494b-a81c-1c759a479867"}},"content":{"_id":"5a4d76916ef087002174c28b","text":"A cat’s nose pad is ridged with a unique pattern, ...","type":"cat","user":{"_id":"5a9ac18c7478810ea6c06381","name":{"first":"Alex","last":"Wohlbruck"}},"upvotes":4,"userUpvoted":null}
Controlling the time range
We can control the number of output records (-n
) and the starting time of the consumption (--from
).
For example, to start the consumption from the last 6 hours and read only 2 records:
zoe -v --cluster local topics consume input -n 2 --from 'PT6h'
{"_id":"5b199196ce456e001424256a","text":"Cats can distinguish different flavors in water.","type":"cat","user":{"_id":"5a9ac18c7478810ea6c06381","name":{"first":"Alex","last":"Wohlbruck"}},"upvotes":6,"userUpvoted":null}
{"_id":"5b1b411d841d9700146158d9","text":"The Egyptian Mau’s name is derived from the Middle...","type":"cat","user":{"_id":"5a9ac18c7478810ea6c06381","name":{"first":"Alex","last":"Wohlbruck"}},"upvotes":5,"userUpvoted":null}
The --from
option takes a duration in ISO-8601 format.
Selecting a subset of the fields
We can format the output rows by using the --query
option and giving it a jmespath expression
. Zoe will run this Jmespath expression against each message and the result will be output instead of the original
message itself. A typical use case is when we want only a subset of the existing fields:
zoe -v --cluster local \
topics consume input \
--query '{id: _id, text: text, user: user.name}'
{"id":"5b199196ce456e001424256a","text":"Cats can distinguish different flavors in water.","user":{"first":"Alex","last":"Wohlbruck"}}
{"id":"5b1b411d841d9700146158d9","text":"The Egyptian Mau’s name is derived from the Middle...","user":{"first":"Alex","last":"Wohlbruck"}}
{"id":"591d9b2f227c1a0020d26823","text":"Every year, nearly four million cats are eaten in ...","user":{"first":"Alex","last":"Wohlbruck"}}
{"id":"59951d5ef2db18002031693c","text":"America’s cats, including housecats that adventure...","user":{"first":"Alex","last":"Wohlbruck"}}
{"id":"5a4d76916ef087002174c28b","text":"A cat’s nose pad is ridged with a unique pattern, ...","user":{"first":"Alex","last":"Wohlbruck"}}
Pretty display
Zoe can display the consumed data in a nicely formatted table by using the --output table
option:
zoe -v --cluster local \
--output table \
topics consume input \
--query '{id: _id, text: text, user: user.name}'
┌──────────────────────────┬───────────────────────────────────────────────────────┬───────────────────┐
│ id │ text │ user │
├──────────────────────────┼───────────────────────────────────────────────────────┼───────────────────┤
│ 5b199196ce456e001424256a │ Cats can distinguish different flavors in water. │ first: "Alex" │
│ │ │ last: "Wohlbruck" │
├──────────────────────────┼───────────────────────────────────────────────────────┼───────────────────┤
│ 5b1b411d841d9700146158d9 │ The Egyptian Mau’s name is derived from the Middle... │ first: "Alex" │
│ │ │ last: "Wohlbruck" │
├──────────────────────────┼───────────────────────────────────────────────────────┼───────────────────┤
│ 591d9b2f227c1a0020d26823 │ Every year, nearly four million cats are eaten in ... │ first: "Alex" │
│ │ │ last: "Wohlbruck" │
├──────────────────────────┼───────────────────────────────────────────────────────┼───────────────────┤
│ 59951d5ef2db18002031693c │ America’s cats, including housecats that adventure... │ first: "Alex" │
│ │ │ last: "Wohlbruck" │
├──────────────────────────┼───────────────────────────────────────────────────────┼───────────────────┤
│ 5a4d76916ef087002174c28b │ A cat’s nose pad is ridged with a unique pattern, ... │ first: "Alex" │
│ │ │ last: "Wohlbruck" │
└──────────────────────────┴───────────────────────────────────────────────────────┴───────────────────┘
We can also use the --output json
to output a valid json instead of a json per row :
zoe -v --cluster local \
--output json \
topics consume input \
--query '{id: _id, text: text, user: user.name}' \
-n 2
[
{
"id": "5b199196ce456e001424256a",
"text": "Cats can distinguish different flavors in water.",
"user": {
"first": "Alex",
"last": "Wohlbruck"
}
},
{
"id": "5b1b411d841d9700146158d9",
"text": "The Egyptian Mau’s name is derived from the Middle...",
"user": {
"first": "Alex",
"last": "Wohlbruck"
}
}
]
These display options are not only availabe for the consume
. They are available for all the zoe commands. In fact, Zoe
can consistently display all its output as a table.
Filtering data based on content
Zoe can also use Jmespath expressions that return a boolean to filter the output messages. Zoe runs this expression against each message and depending on the boolean result, zoe will discard the message or not.
This feature can be used to perform searches into Kafka topics. It is one of the most interesting features of Zoe. When
combined with remote runners (ex. --runner kubernetes
) and parallel execution (--jobs 20
to spin up 20 pods), we can
perform expensive searches in large topics in a relatively short amount of time. You can learn more about runners and
parallel execution in the advanced section of the documentation.
Filters are enabled with the --filter
option. For example, to read only Kasimir's cat facts :
zoe -v --cluster local \
topics consume input \
--from 'PT6h' \
--filter "user.name.first == 'Kasimir'" \
--query '{user: user.name, text: text}'
{"user":{"first":"Kasimir","last":"Schulz"},"text":"Cats are the most popular pet in the United States..."}
{"user":{"first":"Kasimir","last":"Schulz"},"text":"In tigers and tabbies, the middle of the tongue is..."}
{"user":{"first":"Kasimir","last":"Schulz"},"text":"A cat can jump up to six times its length."}
{"user":{"first":"Kasimir","last":"Schulz"},"text":"There are cats who have survived falls from over 3..."}
{"user":{"first":"Kasimir","last":"Schulz"},"text":"The technical term for \"hairball\" is \"bezoar.\""}
If we are dealing with a large topic and want to search for seven days of data, we can offload the consumption to kubernetes and spin up 25 pods to consume data in parallel using the following command :
zoe --cluster my-production-cluster \
--runner kubernetes \
topics consume input \
--from 'P7d' \
--filter "user.name.first == 'Kasimir'" \
--jobs 25
This command will not work as is on your computer at this stage because this requires additional work to configure access to a kubernetes cluster with zoe. But there is a tutorial available in this documentation to try out zoe with a kubernetes cluster using Minikube.
Filtering data based on metadata
Zoe by default exposes only records content. You can use the --expose-metadata
flag to expose the record's metadata as
well in a special field maned __metadata__
by default. This field can be accessed as any other field in the --filter
expressions to filter records based on metadata content, as well as --query
expressions.
The example below shows how to filter records on a given offset and partition:
zoe -v --cluster local topics consume input --expose-metadata --filter "__metadata__.offset == \`1\` && __metadata__.partition == \`1\`"
{"__metadata__": {"key":"59951d5ef2db18002031693c", "offset":1, "timestamp":4596700800645, "partition":1, "topic":"input","headers":{"traceId":"6b3ae7fa-2a8b-494b-a81c-1c759a479867"}},"content":{"_id":"59951d5ef2db18002031693c","text":"America’s cats, including housecats that adventure...","type":"cat","user":{"_id":"5a9ac18c7478810ea6c06381","name":{"first":"Alex","last":"Wohlbruck"}},"upvotes":4,"userUpvoted":null}
The name of the injected metadata field can be modified using --metadataf-field-alias
option.
Record headers can also be used for filtering. The following command show an example of filtering records based on header values:
zoe --cluster local topics consume input \
--expose-metadata --metadata-field-alias 'meta' \
--filter "meta.headers.traceId == '123123'"