ERS Data Product Quality Standards
The mission of ERS is to inform and enhance public and private decision making on economic and policy issues related to agriculture, food, natural resources, and rural development. To accomplish this mission, ERS economists and social scientists develop and disseminate a broad range of science-based economic and statistical information to the public. This suite of data products encompasses estimates, forecasts, economic and statistical indicators, and data compiled from diverse sources where ERS adds value in the form of recompilation and/or subject-matter expertise. ERS disseminates its information to key stakeholders and the public through an array of outlets, including the ERS website.
As a Principal Federal Statistical Agency, ERS is committed to quality and professional standards of statistical practice. ERS uses modern statistical and economic theory and practice in all technical work; develops strong staff expertise in economics, statistics, and other disciplines relevant to its mission; implements ongoing quality-assurance programs to improve data validity and reliability and to improve the processes of collecting, editing, analyzing, and disseminating data; and develops strong and continuing relationships with appropriate professional organizations in relevant subject-matter areas. To carry out its mission, ERS assumes responsibility for determining sources of data, measurement methods, and methods of data collection and processing; employing appropriate methods of analysis; and ensuring the public availability of the data and documentation of the methods used. Within the constraints of resource availability, ERS continually works to improve the quality of its research and its data systems to provide the information necessary for the formulation of informed public policy.
Statistical Policy Directives and other standards issued by the Office of Management and Budget (OMB) in its role as coordinator of the Federal statistical system1 provide a foundation for individual agencies to achieve and safeguard scientific integrity. Specifically, OMB’s directives and standards are designed to preserve and enhance the objectivity, utility, and transparency of statistical products and their dissemination. Examples include:
- Statistical Policy Directive Number 1 affirms the fundamental responsibilities of Federal Statistical Agencies and defines the requirements governing the design, collection, processing, editing, compilation, storage, analysis, release, and dissemination of statistical information by Federal Statistical Agencies.
- Statistical Policy Directive Number 3 is intended to preserve the time value of principal economic indicators, strike a balance between timeliness and accuracy, prevent early access to information that may affect financial and commodity markets, and preserve the distinction between the policy-neutral release of data by statistical agencies and their interpretation by policy officials.
- Statistical Policy Directive Number 4 enumerates procedures intended to ensure that statistical data releases adhere to data quality standards through equitable, policy-neutral, and timely release of information to the general public.
- Standards and Guidelines for Statistical Surveys documents important technical and managerial practices that Federal agencies are required to adhere to and the level of quality and effort expected in all statistical activities to ensure consistency among and within statistical activities conducted across the Federal Government.2
- Guidelines for Ensuring and Maximizing the Quality, Objectivity, Utility, and Integrity of Information Disseminated by Federal Agencies provides guidance to help agencies ensure and maximize the quality, objectivity, utility, and integrity of the information that they disseminate.3
- OMB Circular A-130 prescribes procedural and analytic guidelines for implementing policies for the management of Federal information resources.
- OMB Memo M-13-13 requires release of information in a way that supports downstream information processing and dissemination activities while ensuring stewardship around confidential information.
To ensure and maximize the quality of information disseminated by Federal agencies, including the objectivity, utility, and integrity, OMB issues guidelines that instruct agencies to treat information quality as integral to every stage of the information life cycle from creation or collection, processing, dissemination, use, and storage to disposition.
USDA has also published Information Quality Guidelines. Included in the guidelines are:
- Statistical and Financial Guidelines that pertain to information disseminated by USDA agencies and offices that is obtained from original data collections, administrative records, or compilations of data from primary sources, as well as estimates and forecasts derived from statistical models, expert prediction, or a combination of the two.
- A definition of Influential Scientific, Financial, or Statistical Information that states that, in non-rulemaking contexts, USDA agencies and offices will consider two factors—breadth and intensity—in determining whether scientific, financial, or statistical information is influential. "Information that has an intense impact on a broad range of parties should be regarded as influential." 4
ERS continually assesses the needs of data users and provides a range of products that address those needs most effectively. ERS data products fall into four categories, defined by the type of data and value added:
- Primary Data that are made available to the public, such as the Agricultural Resource Management Survey (ARMS) and the National Household Food Acquisition and Purchase Survey (FoodAPS).
- Model Results, such as Food Dollar Series, Food Price Outlook, Food Availability (Per Capita) Data System, Commodity Supply and Use, and Agricultural Trade Multipliers.
- Summary Statistics derived from single or multiple sources of primary data, such as Foreign Agricultural Trade of the United States (FATUS) and U.S. Bioenergy Statistics.
- Repackaged Data derived from existing ERS data products or published research results, such as Charts of Note and State Fact Sheets.
Several data products cut across more than one data type (primary data, model results, and summary statistics), such as Farm Income and Wealth Statistics and Food Security in the United States. This would also include GIS-based models, such as the Food Access Research Atlas.
In addition to characterizing ERS data products by the type of data and value added, criteria have been developed that allow a distinctive ranking of data products in terms of adherence to the OMB and USDA definitions of Influential Scientific, Financial, or Statistical Information and in accordance with their importance to the agency's mission:
- Premier data products that are determined by senior ERS management to be influential and central to the agency’s mission, as well as adhering to all components of quality guidelines as applicable.
- Core data products that are central to the agency's mission but may not meet the definition of influential. Also in this category are foundational data (as mentioned in the ERS Strategic Plan), such as data that are inputs to premier data products.
- Other data products that serve key agency stakeholders and the public.
Data Product Quality Guidelines
OMB guidelines require agencies to develop a process for reviewing the quality of information before it is disseminated to the public to ensure that it meets OMB’s standards for objectivity, utility, and integrity. The guidelines below describe the principles that ERS will follow to ensure these quality standards are embedded in data products provided to key stakeholders and the public. The ERS Data Product Review Council (DPRC) will oversee and implement these standards. The policy is applied broadly to all data products, and each data product must meet each standard of purpose, utility, objectivity, transparency, integrity, and accessibility.
These guidelines recognize there is a hierarchy among data products based on measurable attributes that distinguish various levels of product quality. All premier products must meet all levels defined in each standard, as applicable by data type. The extent to which each specific standard applies to core and other products may vary, depending on the type of data product. Initially, the DPRC will review each product to determine the extent to which the standards apply and how the products currently conform. Data products other than premier or core should undergo a cost-benefit analysis to ascertain the value to the agency.
All policies and procedures described in this document are applicable to all ERS employees, contractors, visiting scholars, cooperators, or others to whom access to ERS data has been granted.5
OMB Quality Attributes and Standards
ERS is a Principal Federal Statistical Agency whose function is the compilation, analysis, and dissemination of information encompassing a broad spectrum of agriculture, food, the environment, and rural development for statistical purposes. As such, ERS must provide objective, accurate, and timely information on the subject areas in its purview that is relevant to issues of public policy; must maintain credibility among its data and information users; must have the trust of its data providers; and must be independent from political and other undue external influence in conducting its statistical activities.6
ERS coordinates and collaborates with other statistical agencies or professional organizations to enhance the value of its own information and that of other agencies in the Federal statistical system. It is a member of, and actively participates in, many interagency or non-Federal economic and statistical committees and working groups (e.g., USDA’s Interagency Commodity Estimates Committees) to achieve the mission of the Department. It also supports policy/program administration in other USDA agencies by providing economic/statistical analysis and through data-sharing.
Reasons for and methods of data sharing should be documented whenever ERS provides data to, or receives data from, other Federal agencies.
1.1 Restricted (nonpublic) private and not yet published data that are distributed to other Federal agencies on a regular basis and outside of the normal/interagency publication process should have documented agreements (e.g., a memorandum of understanding [MOU] or a memorandum of agreement [MOA]) that indicate the purpose of data sharing, length of agreement, etc., and adherence to data security standards.
Routine data-sharing arrangements for restricted (nonpublic) and not yet published information between ERS and other Federal agencies should be documented. The preferred method is an MOU, an MOA, or an Interagency Agreement (IAA). In some circumstances, other written agreements may be appropriate.. If such an agreement is not in place, division staff and data product authors should explore (in conjunction with the Associate Administrator) formalizing the data-sharing arrangement with an MOU/MOA/IAA. If an MOU/MOA/IAA is not appropriate, the justification and reasons should be documented.
1.2 Data products that use restricted data from other Federal or external sources must have documented agreements (e.g., MOU, MOA, IAA) that indicate the purpose of data sharing and adherence to data security standards.
Routine receipt of special tabulations and/or nonpublic data by ERS staff from other Federal agencies or external sources should be documented via an MOU/MOA/IAA or other written arrangement that indicates the purpose of data sharing, adherence to data security standards, and other necessary specifications. If an MOU/MOA/IAA is not appropriate, the justification and reasons should be documented.
Utility refers to the usefulness of the information to intended primary users.7 One aspect of utility is fitness for use (data are appropriate for intended audiences). Another aspect is timeliness, which can be measured by two characteristics: the length of the data collection’s production time (e.g., the time from data collection until first release) and the frequency of the data collection or update.
ERS achieves utility for its data products by staying informed of stakeholder needs; developing new data, models, and information products where appropriate and to the extent practicable; making information widely available and easily accessible; updating existing data products in a timely manner; and helping users understand and use its products efficiently and effectively.
2.1 ERS is positioned as the preeminent or sole source provider of its premier data products.
As a Principal Federal Statistical Agency, ERS must be knowledgeable about the issues and requirements of public policy and Federal programs pertinent to the USDA mission and be able to provide objective information that is relevant to policy and program needs. The unique alignment of resources and expertise create specific capabilities to produce important and influential data products that would otherwise not exist. This is demonstrated by ERS being the sole source provider of these data, including instances where ERS adds value in the form of recompilation and/or subject-matter expertise or how the data product is pertinent to the USDA mission and its relevance to Federal policy and program needs.
2.2 All data products must be branded (sourced) as coming from ERS.
Data products must be branded as coming from ERS (when they are released by ERS), as standard best practice for documentation, to more accurately measure impact and to improve ERS’s profile as a Principal Federal Statistical Agency. This is demonstrated by links or other evidence of ERS branding on data-product web pages and in tables, charts, etc. As applicable, data products should also cite the source of the input data (such as from multiple Federal agencies).
2.3 Updates to premier data products that occur on a routine basis must be publicly reported. As such, future releases must be reported on the ERS calendar. If possible, other data products should also provide a schedule for the next update on the ERS website.
To ensure equivalent and timely access to all users, a schedule and mode of release must be developed and publicly conveyed in the calendar year prior to the planned release of a data product.8
2.4 All data products should identify key internal ERS and external stakeholders.
Persons or organizations that have a vested interest in the data and information provided in a data product are considered stakeholders. Interacting with and knowing the interests, positions, alliances, and importance to ERS of key stakeholders enables data product authors to communicate more effectively with these individuals and adapt to their changing needs.
2.5 Premier data products must undergo formal outside review with key stakeholders for ease of understanding and content relevance every 5 years. At a minimum, informal reviews are encouraged for other data products.
Measures of content relevance and quality of communication can include degree of accessibility,9 clarity, customer satisfaction with ease of use, or other stakeholder interactions from venues such as user conferences, exhibits and other promotional materials, demos, and media citations.
2.6 All data products must have a website feedback mechanism or employ alternative ways to communicate with users.
Recognizing the diversity of data users and their importance, all data products should employ a feedback/input mechanism—based on a strategy of engagement with users to help facilitate and prioritize data release.10 Website contact forms, for example, allow for the elicitation of feedback from users in a secure and organized manner. Other forms of communication with data users include public meetings that seek comments from data users on recent and pending changes or providing technical support. Practices to improve communication with users employed by various statistical agencies were highlighted in the 2009 Committee on National Statistics (CNSTAT) review.11
2.7 Web usage statistics for data products should be regularly evaluated, used to improve and/or refine the data product, and to set priorities as necessary.
Statistics about use may provide data on visits, page views, product downloads, average time on site, referrers, and number of new/unique visits. This type of information can assist with priority setting and product refinement.
Objectivity is a measure of whether disseminated information is accurate and reliable and whether that information is presented in a clear, complete, and unbiased manner.12 Agencies should inform the public as to the strengths and limitations inherent in the information disseminated (e.g., sources of error, degree of reliability, and validity) so that users are fully aware of the quality of the information.13
3.1 All data products are reviewed for data quality prior to dissemination.
Data products produced by ERS are thoroughly reviewed by knowledgeable staff prior to dissemination to verify the accuracy and validity of the data. The procedure used to conduct this review is managed by the Branch Chief and must be documented and available upon request. Data are checked for internal consistency, consistency with other similar data sets or prior year versions of the same data set, and sources of error. Knowledgeable ERS subject-matter experts conduct “reasonableness” checks of the data. Where necessary, the data are edited and missing values are imputed using established statistical techniques to improve the utility of the data.
3.2 Premier data products must undergo an independent external review of methods at least every 10 years, as appropriate. The agency may elect to solicit external reviews of non premier data products on an as-needed basis.
External reviewers bring to the review process diversity of perspectives and expertise to ensure the data product is objective, meaningful, and credible. The breadth and extent of review will be determined by the type of data product. For example, for data products that employ surveys or models, an external review could evaluate procedural methods and statistical validity. For compilations of data, the review could focus on the appropriateness of the data used and the clarity and adequacy of the documentation.
3.3 Where statistically appropriate, all data products must report measures of accuracy that accompany data elements.
Different types of data products might use different accuracy measures. For example, forecast error would be reported for estimates or projections, and estimates of sampling error and nonsampling error components (coverage error, measurement error, nonresponse error, and processing error), to the extent practicable, should be reported for sample survey programs.14 On the other hand, a data compilation can refer users to source agencies for information on data quality.
3.4 Data products should have an ongoing research program that examines methods and operations.
Research on methods and operational procedures must be ongoing for statistical and economic agencies to be innovative and cost-efficient in methods or practices for data collection, analysis, and dissemination. For example, improvements could include methodological research (modeling improvements, such as refining forecast methods and simulations) and/or operational procedures (improving data fielding and increasing the efficiency of data processing).
3.5 The production/dissemination process for premier data products should receive priority for IT investment and must undergo an evaluation of IT approaches every 5 years. If possible and as applicable, other data products should undergo a similar evaluation.
ERS will continue to invest in activities that ensure agency data products are of high quality, meet OMB guidelines and practices of statistical organizations, and meet the highest priority needs of its customers and stakeholders. This includes modern technologies for data collection, processing, management, and dissemination.
OMB requires that Federal agencies offer a high degree of transparency about data and methods used to derive statistics. These requirements enable the American public maximum access to government data and ensure reproducibility of government statistics, meaning "the capacity to use the documented methods on the same data set to achieve a consistent result." 15
4.1 Decisions made by ERS to initiate, terminate, or substantially modify the content, form, frequency, or availability of premier data products, and other data products if possible, should trigger appropriate advance public notice.
Stakeholders and the public should be made aware of upcoming changes to premier data products and other data products to the extent possible.16 Major updates/upgrades should be announced on the ERS website calendar of releases, within the data product itself, and where appropriate, via email notification to stakeholders/subscribers, or other types of communication. (Where appropriate, the Office of Communications should be notified directly).
4.2 All data products must be accompanied by accurate, transparent documentation that describes the source of the data, the method used to produce the data, definitions of data items, variables contained in the data set, sources of error, and, if applicable, limitations of the data.
Many analytical problems and misinterpretation of data can be avoided by providing comprehensive documentation. OMB Statistical Policy Directive Number 4 states that "With the exception of compilations of statistical information collected and assembled from other statistical products, these [Federal statistical] products shall contain or reference appropriate information on the strengths and limitations of the methodologies, data sources, and data used to produce them as well as other information such as explanations of other related measures to assist users in the appropriate treatment and interpretation of the data."
OMB provides detailed guidelines for, and a comprehensive list of, necessary components to be included in survey documentation (and other types of Government data to the extent they are applicable) in section 7.3 of the Standards and Guidelines for Statistical Surveys. Some sample documentation elements include a description of variables used to uniquely identify records in the data file; a description of the sample design, including strata and sampling unit identifiers to be used for analysis; and a description of sample weights, including adjustments for nonresponse and benchmarking and how to apply them.
4.3 If data in an ERS product are similar to data reported in other ERS or Federal sources, the data product must explain the differences in a guide to users.
ERS data products should endeavor to clear up potential confusion with other sources of data. In certain situations, they should contain a guide to users to explain how best to use and interpret the data. For ERS data products that contain values derived using similar concepts or contain information that cannot be easily differentiated from other ERS or Federal data sources, a guide provides a way for users to fully understand the intended purpose of the data product and assist in distinguishing the best statistical series for the user’s intended purpose. In addition, if the product contains similar concepts to those in other ERS data products, then the data product must explicitly discuss differences, if present, in a prominent place within the product. Non-Federal sources should be discussed if appropriate.
4.4 Premier data products must provide information on the update and revision history. If possible, other data products should also provide such information.
Data revisions can occur for a variety of reasons, including inclusion of new data or a change in the data source; seasonal adjustment and/or elimination of calendar effects; transition to a new base period; improvement of methods or a change in classifications, concepts, and definitions; or elimination of errors. To help the public understand and use the data, information should be provided that describes the revisions made, reasons why they were made, and any implications. To ensure transparency of the revision procedure, where applicable, information should be provided that describes the revision procedure and assesses quality changes (for example, differences in data sources and calculation methods).
4.5 All data products must have an archival capability going back 5 years.
For purposes of reproducibility, ERS should be able to provide users with previous releases/versions of data offered in data products, either as part of a data product on the ERS website or upon request.
"Integrity" refers to the security of information—protection of the information from unauthorized access or revision to prevent the information from being compromised through corruption or falsification.17
5.1 All data products that use underlying primary, proprietary, and sensitive data must have a defined procedure for pre-dissemination review to ensure that privacy and confidentiality of individual responses are fully protected and that data are properly secured.
Data products produced by ERS are thoroughly reviewed by knowledgeable staff prior to dissemination to ensure that information is protected commensurate with the risk and magnitude of harm that would result from the loss, misuse, or unauthorized access to or modification of such information. Procedures should be documented and available upon request.
5.2 Data storage and processing, pre-release security procedures, and release procedures will be reviewed every 3 years for all data products.
Procedures for data storage, security, and processing must comply with OMB guidelines18 and the ERS Data Security Policy, particularly for primary, proprietary, and sensitive data. Methods used for pre-release review must conform to applicable security requirements and be documented and available upon request.
5.3 Staff assigned to production of premier and core data products will undergo training for all related policies and standards. If possible, staff assigned to other data products should receive such training.
An effective Federal statistical agency has personnel policies that provide training to encourage the development and retention of a strong professional staff who are committed to the highest standards of quality work. Training can also include efforts to enhance understanding of these guidelines.
Beyond required USDA AgLearn training for all employees, there are several other types of training that are available for data integrity, such as short courses offered by the Joint Program in Survey Methodology (JPSM) for data confidentiality; training provided for Federal statistical agencies on the Confidential Information Protection and Statistical Efficiency Act (CIPSEA), such as courses available in AgLearn or offered by USDA’s National Agricultural Statistics Service; and courses provided for specific data access, such as Census Title 13.
Data products have their most value when they are made available to the widest range of users for the widest range of purposes and impose no barriers to any person or group of persons. Accessibility refers to the ability of any user to obtain, manipulate, and save data.19
6.1 Data products must be released in common machine-readable formats that facilitate ease of use by a range of audiences.
ERS data products must be released in common machine-readable formats that facilitate ease of use by a range of audiences and minimize the obstacles to using information contained in data files. New data products must conform to this standard, and existing data products will be migrated over time. All data products must also meet Section 508 Accessibility Standards to ensure full access by the visually or hearing impaired.
Machine-readable data (or metadata) are in a format that can be understood by a computer. There are two types: data that are marked up so they can be used by humans and also be read by machines (e.g., screen readers for the visually impaired), and data-file formats intended primarily for processing by machines (e.g., data to be loaded into a database).
Adherence is demonstrated by evidence of: 1) common machine-readable formats intended primarily for processing by machines (e.g. CSV, RDF, XML, JSON) and/or a data model that supports ad hoc analysis;20 2) "open" formats (CSV, JSON, bulk downloads, etc.); and 3) 508 accessibility (e.g., well-structured for screen reader technology; chart legends, alt tags, and D-Links available; and interpretation not reliant on color).
6.2 Premier data must undergo usability testing in the design/development stage to ensure they are intuitive, navigable, and produce expected results. Usability testing for non premier data products may occur on an as-needed basis.
Usability testing can help ensure data products are designed to meet users’ needs. Informal testing is encouraged for all data products.
6.3 Data products must take steps to conform to OMB Open Data Guidelines.
The above recommendations for data quality address many of the OMB principles for Open Data: Public, Accessible, Described, Reusable, Complete, Timely, and Managed Post-Release.21 Data management procedures will be adopted going forward to support the quality and openness principles.
To implement open data guidelines, ERS data products will be captured in agency and Federal Government metadata inventories. In general, metadata is data that describes data—Structural metadata is information about how the data are stored and presented; and Descriptive metadata is about the data content. The role of the data product manager is to coordinate with the Web Steering Committee and/or Information Services Division to initiate the development of appropriate metadata for their data product, particularly in the case of premier products.
2 These standards apply to “Federal censuses and surveys” and, to the extent they are applicable, they “also cover the compilation of statistics based on information collected from individuals or firms…, applications/registrations, or other administrative records.”
4 See Guidelines for Ensuring and Maximizing the Quality, Objectivity, Utility, and Integrity of Information Disseminated by Federal Agencies for a complete explanation of the definition.
8 Statistical Policy Directive Number 4 states that “Prior to the beginning of the calendar year, the releasing statistical agency shall annually provide the public with a schedule of when each regular or recurring statistical product is expected to be released during the upcoming calendar year by publishing it on its Web site.”
10 OMB Memo M-13-13 Open Data Policy, Managing Information as an Asset, Attachment, I. Definitions, Open Data, Managed Post-Release and III. Policy Requirements, 3c Strengthen Data Management and Release Practices create a process to engage with customers to help facilitate and prioritize data release.
15 “Federal Statistical Organizations’ Guidelines for Ensuring and Maximizing the Quality, Objectivity, Utility, and Integrity of Disseminated Information,” Federal Register, June 4, 2002, pp. 38467-70.
16 OMB Circular A-130 states that agencies should “provide adequate notice when initiating, substantially modifying, or terminating significant information dissemination products.” OMB Statistical Policy Directive Number 4 states that “Statistical agencies shall announce, in an appropriate and accessible manner as far in advance of the change as possible, significant planned changes in data collection, analysis, or estimation methods that may affect the interpretation of their data series. In the first report affected by the change, the agency must include a complete description of the change and its effects and place the description on its Internet site, if the report is not otherwise available there.”
20 In general, data models that support one type of analysis may not be ideal for other types of analysis. Flexibility to the end user can include, for example XML, JSON or CSV, but must be devoid of format imposed by the data product manager or the software used.