Profile Outlier Detection using Aggregate Metrics – Profile Analytics
Posted by: Informatica
Profile analytics is designed to identify outliers by comparing the metrics of the latest profile run to the aggregated statistics of all the previous runs, for a particular profile. Outliers are detected for the total rows, as well as the individual elements of the profile.
The profiling analytics template has been created to scan the results of multiple profile runs of a single dataset over time and identify the anomalies of the current run against the aggregated statistics derived from the previous runs. This template can be applied to any dataset that has columns with numeric data to be profiled and analyzed.
The template can be designed to pick data, of the runs from say, the last 20 days, and build aggregations and baseline statistics from it. The underlying data of the PDO can be altered, profiled and can be compared with the baseline statistics created (from history). The latest profile run is checked for changes to min and max values of a field below or above the 5-percentile or 95-percentile thresholds respectively, sudden hike of null counts, or drop of distinct value counts, etc.
- Generate aggregate statistics from all the historic runs of a profile, which is not possible via profiling.
- Compare the latest run of a profile to the aggregate statistics generated and identify the outliers present in the current dataset, along with null counts and distinct value counts.
- The outputs are written into tables. This data can be visualized using Tableau/PowerBI
Informatica DQ Content Team