1- Demographics Overview
InReality’s Analytics platform has a Demographics (face detection and tracking) data source created from InReality’s Awareness (Demographic) software engine. Each face detected is assigned a unique, anonymized facial ID. With this ID, you can distinguish between repeat and new viewers and map a consumer’s journey of interest across the venue or retail location via strategically placed endpoints. Think of it as a physical website cookie, but for brick and mortar locations. The unique facial ID is specific to InReality’s technology and is a critical component for data reliability. It determines whether a person has been previously identified by the sensor, while also capturing different data points such as:
- Gender (first face detected) - UNKNOWN, MALE, FEMALE
- Age (first face detected) - NUMBER or UNKNOWN
- Age group (first face detected) - YOUTH, YOUNG ADULT, ADULT, SENIOR, UNKNOWN
- Proximity (first face detected) - NUMBER (measured in CM)
- Number of faces in FOV - NUMBER
- Gender Ratio - MORE MALE, MORE FEMALE, SAME
- Age Spread - NUMBER
- Smile - YES, NO
- Attention - YES, NO
- Emotions - NEUTRAL, HAPPY, SAD, SURPRISE, ANGER
The following sections provide descriptions of performance testing, calculations and test results.
2- Test Results Overview - Accuracy
The accuracy of the reported analytics is impacted by several factors, including the camera position, face perspective and distance, area lighting, accessories such as hats and glasses, whether or not there is a filter, and facial expressions.
In brief, the InReality metrics test results are:
Gender: 95% accurate
Age Group: 80% accurate
Age: 80% accurate (+/- 5 years)
Dwell Time: 95% accurate (+/- 1 sec)
3- Measuring Accuracy
In the field of machine learning and specifically the problem of statistical classification, a confusion matrix, also known as an error matrix, is a specific table layout that allows visualization of the performance of an algorithm, typically a supervised learning one. Each row of the matrix represents the instances in a predicted class while each column represents the instances in an actual class (or vice versa). The name stems from the fact that it is easy to see if the system is confusing two classes (i.e. commonly mislabeling one as another).
It is a special kind of contingency table, with two dimensions ("actual" and "predicted"), and identical sets of "classes" in both dimensions. Each combination of dimension and class is a variable in the table.

3.1- Classes: True Positive, True Negative, False Positive, False Negative
True positives and true negatives are the observations that are correctly predicted and therefore shown in green. We want to minimize false positives and false negatives so they are shown in red. These terms may be confusing, so let’s take each term individually and understand it fully.
True Positives (TP) – These are the correctly predicted positive values. The value of the actual class is yes and the value of the predicted class is also yes. A true positive indicates a correct identification.
True Negatives (TN) – These are the correctly predicted negative values. The value of the actual class is no and the value of the predicted class is also no. A true negative indicates a correct rejection.
False Positives (FP) – These are the incorrectly predicted positive values. When actual class is no and predicted class is yes. A false positive indicates an incorrect identification.
False Negatives (FN) – These are the incorrectly predicted negative values. When actual class is yes but predicted class is no. A false negative indicates an incorrect rejection.
3.2- Evaluating performance of a model via: Accuracy, Precision, Recall & F1 Score Metrics
Once we understand these four (TP, TN, FP, FN) parameters, we can calculate Accuracy, Precision, Recall and F1 score.
Precision – Precision is the ratio of correctly predicted positive observations to the total predicted positive observations. Therefore, precision shows the percentage of positive class predictions that actually belong to the positive class. High precision relates to the low false positive rate. We have a 0.788 precision rate, which is fairly accurate.

Recall (Sensitivity) – Recall is the ratio of correctly predicted positive observations to all observations in the actual class labeled yes. Therefore, recall is the percentage of correct positive identifications. We have a recall of 0.631, which is good for this model as it is above 0.5.

F1 Score – F1 Score is the weighted average of Precision and Recall. Therefore, this score takes both false positives and false negatives into account. Intuitively it is not as easy to understand as accuracy, but F1 is usually more useful than accuracy, especially if you have an uneven class distribution. Accuracy works best if false positives and false negatives have similar cost. If the cost of false positives and false negatives are very different, it’s better to look at both Precision and Recall. In our case, F1 score is 0.701.

Accuracy – Accuracy is the most intuitive performance measure and it is the ratio of correctly predicted observations to the total observations. One may think that, if we have high accuracy, then our model is best. Yes, accuracy is a great measure, but only when you have symmetric datasets where values of false positive and false negatives are almost the same. Therefore, you have to look at other parameters to evaluate the performance of your model. For our model, we have 0.803, which means our model is approx. 80% accurate.

4- Performance Testing Results
4.1- Testing Dataset
The results are based on internal laboratory testing by passing the images directly into InReality’s AVA. The test dataset contains 8,000 images, each displaying a single individual's face in a public setting.

4.2- Demographics AVA (Anonymous Video Analytics)
F1 Score: Male = 0.95
Female = 0.95

5- Summary of Results
The accuracy of the reported analytics is impacted by several factors, including the camera position, face perspective and distance, area lighting, accessories such as hats and glasses, whether or not there is a filter, and facial expressions.
5.1- Test Results Overview
In brief, the InReality metrics test results are:
Gender: 95% accurate
Age Group: 80% accurate
Age: 80% accurate (+/- 5 years)
Dwell Time: 95% accurate (+/- 1 sec)
5.2- Test Results Details
InReality’s AVA has a very high score in detecting gender of the sample and a satisfactory score in the common age group of the data set (20 - 34). These scores provide reliable data collecting methods in venues when the camera is properly set.


6- Recommendations for getting the best Results
The accuracy of the reported analytics is impacted by several factors including position of camera, face perspective and distance, area lighting, accessories such as hats and glasses, and facial expressions. This reported data is intended to be used to establish a baseline performance of the marketing material with which it is associated and gauge the subsequent impact of changes.
For the best results the camera must be near face height to see faces from the front (not too high or low) and should have proper Auto Exposure (AE) / Auto Focus (AF) / Auto White Balance (AWB) so that image quality is fine. Illumination should be constant. The face size should be > 100 pixels between the eyes.
6.1- Face Posture
The engine has certain tolerance to face posture:
- head roll (tilt) – ±15 degrees;
- head pitch (nod) – ±15 degrees from frontal position.
- The head pitch tolerance can be increased up to ±25 degrees if several views of the same face that cover different pitch angles were used during enrollment.
- head yaw (bobble) – ±45 degrees from frontal position (configurable). ±15 degrees default value is the fastest setting which is usually sufficient for most near-frontal face images.