Types of Data
How I think about types of data.
Batch Data
Single Entities
Sets of Data
Streaming Data
Streaming Data is a never ending flow of data. Some examples include Twitter tweets, hospital bed telemetry data, subway ticket scans, etc. I like to further categorize data seen as these:
Application Logs
Intra-application Messages
Inter-application Messages
Business Events
Application Logs
In this scenario, your application is writing logs so engineers can see what's going on and troubleshoot technical issues.
This data is meant for you and only you. It's not meant for business folks to use so avoid providing it for analytical purposes.
Ensure you log discrete data, avoid logging sensitive data, and use appropriate logging levels.
Educate others on how to find your application's logs and what IDs can tie logs across applications.
Intra-application Messages
In this scenario, your application has separate components that communicate with one another.
This data is meant for you and only you. It's not meant for business folks to use so avoid providing it for analytical purposes.
Inter-application Messages
In this scenario, your application submits data to another application you do not control.
This data is meant for the receiving system only. It's not meant for business folks to use so avoid providing it for analytical purposes.
Business Events
In this scenario, applications fire true fire-and-forget business events meant for others to consume, including subscribers responsible for populating the analytical data store.
Example events might include Order Placed, New Customer Registration, Promo Code Used, and so on. The data format is crafted specifically for others to consume.
Avoid making breaking changes to your events and if a breaking change is needed, version your event. Plus, give folks time to move to the newer version (i.e. publish both versions for some time).