Behind the Data: Validation Study Tests Accuracy of Homescan Data

Nielsen Homescan data provide a wealth of information about household purchasing patterns, allowing researchers to address questions relating to the dynamics of retail food markets. Households participating in the Homescan panel use a scanner to record prices and quantities of food products purchased at a wide variety of stores. ERS and other researchers have used these data to understand consumer purchase behavior. However, some researchers question the credibility of the data since the data are self-recorded and the recording process is time consuming.

With most surveys and primary statistical samplings, it is nearly impossible to estimate the accuracy of participants’ self-recorded information. However, due to a unique data overlap, researchers were able to compare Homescan data with retail store data on consumer purchases. This analysis suggests that Homescan data contain some recording errors, but the overall accuracy seems to be in line with other commonly used economic data sets.

A challenge in comparing Homescan data with other data sources is the unique way in which Homescan data are collected. For each shopping trip, participating households record the date and store using a scanner provided by Nielsen. They then scan barcodes of the products purchased and enter the quantity of each item, whether it was bought at the regular or promotional price (such as a loyalty-card discount), and the coupon amount (if used) associated with the purchase. To lessen the time burden, Nielsen does not require households to enter prices for items bought from major supermarket chains for which Nielsen has store-level price data. Instead, Nielsen uses the chain store-level data to construct average weekly prices for the items.

ERS and academic researchers used store checkout data from a major supermarket chain to examine the accuracy of Homescan data. A procedure was developed to match shopping trips in each data set based on the products purchased. Transactions were compared in several ways to determine the similarity of self-reported Homescan information to retailer records. There was no corresponding transaction in the retailer’s data for approximately 20 percent of food-shopping trips recorded in the Nielsen Homescan data, suggesting that the household misrecorded either the store name or the date of the shopping trip.

For shopping trips that did match, researchers analyzed which items were more likely to be missing in the Homescan trip by grouping the missed items into product categories. Two types of items were commonly missing. The first group included on-the-go consumables such as snacks or small drinks. It is likely that such items were often consumed on the way home, before the purchase could be scanned.

The second group included items in categories containing many products with distinct, yet similar Universal Product Codes (UPCs), such as different-flavored yogurts and baby foods. It is likely that individuals simply scanned one container and entered a large quantity instead of scanning each flavor (which would have a distinct UPC) separately.

The study also found that a greater share of expenditures was missed on larger trips, suggesting that scanning a large number of items at one time may have been too time consuming for the household. Overall, roughly 20 percent of the items purchased were not recorded by the Homescan panelists. The quantity of recorded items, however, was reported fairly accurately: 94 percent of the quantity information matched in the two data sets. The match for prices was lower; in almost half the cases, the two data sets did not agree. However, much of this difference can be attributed to transactions that involved promotional or other temporary sale prices. Nielsen’s practice of using store-level data as an estimate of what households actually paid—rather than recording errors by panelists—is likely the cause of the price differences in these situations. For prices that involve no promotion or temporary price reduction, there are recording errors in about 17 percent of the cases.

Do the recording errors matter? Random errors are less of an issue than if recording errors are more prevalent for certain items or types of households. Such errors could affect statistical analyses and lead to incorrect conclusions. The researchers found that certain demographic measures are more likely to lead to inconsistent results between the two data sets. For example, the study shows that age, race, education level, and male employment status affect prices paid for food differently when analyzed in Homescan versus retailers’ data. Although neither data set in this study, nor most data in general, can be 100 percent accurate in all measures, these differences imply that caution may be warranted when drawing conclusions from some research results using Homescan data.