1- Demographics Overview

InReality’s Analytics platform has a Demographics (face detection and tracking) data source created from InReality’s Awareness (Demographic) software engine. Each face detected is assigned a unique, anonymized facial ID. With this ID, you can distinguish between repeat and new viewers and map a consumer’s journey of interest across the venue or retail location via strategically placed endpoints. Think of it as a physical website cookie, but for brick and mortar locations. The unique facial ID is specific to InReality’s technology and is a critical component for data reliability. It determines whether a person has been previously identified by the sensor, while also capturing different data points such as:

Gender (first face detected) - UNKNOWN, MALE, FEMALE
Age (first face detected) - NUMBER or UNKNOWN
Age group (first face detected) - YOUTH, YOUNG ADULT, ADULT, SENIOR, UNKNOWN
Proximity (first face detected) - NUMBER (measured in CM)
Number of faces in FOV - NUMBER
Gender Ratio - MORE MALE, MORE FEMALE, SAME
Age Spread - NUMBER
Smile - YES, NO
Attention - YES, NO
Emotions - NEUTRAL, HAPPY, SAD, SURPRISE, ANGER

The following sections provide descriptions of performance testing, calculations and test results.

2- Test Results Overview - Accuracy

The accuracy of the reported analytics is impacted by several factors, including the camera position, face perspective and distance, area lighting, accessories such as hats and glasses, whether or not there is a filter, and facial expressions.

In brief, the InReality metrics test results are:

Gender: 95% accurate

Age Group: 80% accurate

Age: 80% accurate (+/- 5 years)

Dwell Time: 95% accurate (+/- 1 sec)

3- Measuring Accuracy

In the field of machine learning and specifically the problem of statistical classification, a confusion matrix, also known as an error matrix, is a specific table layout that allows visualization of the performance of an algorithm, typically a supervised learning one. Each row of the matrix represents the instances in a predicted class while each column represents the instances in an actual class (or vice versa). The name stems from the fact that it is easy to see if the system is confusing two classes (i.e. commonly mislabeling one as another).

It is a special kind of contingency table, with two dimensions ("actual" and "predicted"), and identical sets of "classes" in both dimensions. Each combination of dimension and class is a variable in the table.

3.1- Classes: True Positive, True Negative, False Positive, False Negative

True positives and true negatives are the observations that are correctly predicted and therefore shown in green. We want to minimize false positives and false negatives so they are shown in red. These terms may be confusing, so let’s take each term individually and understand it fully.

True Positives (TP) – These are the correctly predicted positive values. The value of the actual class is yes and the value of the predicted class is also yes. A true positive indicates a correct identification.

True Negatives (TN) – These are the correctly predicted negative values. The value of the actual class is no and the value of the predicted class is also no. A true negative indicates a correct rejection.

False Positives (FP) – These are the incorrectly predicted positive values. When actual class is no and predicted class is yes. A false positive indicates an incorrect identification.

False Negatives (FN) – These are the incorrectly predicted negative values. When actual class is yes but predicted class is no. A false negative indicates an incorrect rejection.

3.2- Evaluating performance of a model via: Accuracy, Precision, Recall & F1 Score Metrics

Once we understand these four (TP, TN, FP, FN) parameters, we can calculate Accuracy, Precision, Recall and F1 score.

Precision – Precision is the ratio of correctly predicted positive observations to the total predicted positive observations. Therefore, precision shows the percentage of positive class predictions that actually belong to the positive class. High precision relates to the low false positive rate. We have a 0.788 precision rate, which is fairly accurate.

Recall (Sensitivity) – Recall is the ratio of correctly predicted positive observations to all observations in the actual class labeled yes. Therefore, recall is the percentage of correct positive identifications. We have a recall of 0.631, which is good for this model as it is above 0.5.

F1 Score – F1 Score is the weighted average of Precision and Recall. Therefore, this score takes both false positives and false negatives into account. Intuitively it is not as easy to understand as accuracy, but F1 is usually more useful than accuracy, especially if you have an uneven class distribution. Accuracy works best if false positives and false negatives have similar cost. If the cost of false positives and false negatives are very different, it’s better to look at both Precision and Recall. In our case, F1 score is 0.701.

Accuracy – Accuracy is the most intuitive performance measure and it is the ratio of correctly predicted observations to the total observations. One may think that, if we have high accuracy, then our model is best. Yes, accuracy is a great measure, but only when you have symmetric datasets where values of false positive and false negatives are almost the same. Therefore, you have to look at other parameters to evaluate the performance of your model. For our model, we have 0.803, which means our model is approx. 80% accurate.

4- Performance Testing Results

4.1- Testing Dataset

The results are based on internal laboratory testing by passing the images directly into InReality’s AVA. The test dataset contains 8,000 images, each displaying a single individual's face in a public setting.

Screen_Shot_2022-03-31_at_11.11.25_AM.png

4.2- Demographics AVA (Anonymous Video Analytics)

F1 Score: Male = 0.95

Female = 0.95

Screen_Shot_2022-03-31_at_11.11.52_AM.png

5- Summary of Results

5.1- Test Results Overview

In brief, the InReality metrics test results are:

Gender: 95% accurate

Age Group: 80% accurate

Age: 80% accurate (+/- 5 years)

Dwell Time: 95% accurate (+/- 1 sec)

5.2- Test Results Details

InReality’s AVA has a very high score in detecting gender of the sample and a satisfactory score in the common age group of the data set (20 - 34). These scores provide reliable data collecting methods in venues when the camera is properly set.

Sect_5.2A.png

Sect_5.2B.png

6- Recommendations for getting the best Results

The accuracy of the reported analytics is impacted by several factors including position of camera, face perspective and distance, area lighting, accessories such as hats and glasses, and facial expressions. This reported data is intended to be used to establish a baseline performance of the marketing material with which it is associated and gauge the subsequent impact of changes.

For the best results the camera must be near face height to see faces from the front (not too high or low) and should have proper Auto Exposure (AE) / Auto Focus (AF) / Auto White Balance (AWB) so that image quality is fine. Illumination should be constant. The face size should be > 100 pixels between the eyes.

6.1- Face Posture

The engine has certain tolerance to face posture:

head roll (tilt) – ±15 degrees;
head pitch (nod) – ±15 degrees from frontal position.

The head pitch tolerance can be increased up to ±25 degrees if several views of the same face that cover different pitch angles were used during enrollment.

head yaw (bobble) – ±45 degrees from frontal position (configurable). ±15 degrees default value is the fastest setting which is usually sufficient for most near-frontal face images.

Camera Analytics - Demographics Performance Benchmark - Overview