Re- imagining Public Data to Advance AI: Maximizing Insights from Federal Data

The U.S. Federal Data Strategy is well-aligned with Intel’s recommendations that new regulatory initiatives should be comprehensive and technology-neutral, support the free flow of data, and promote access to data while liberating data responsibly. Intel has been, and remains, committed to helping accelerate federal agencies’ responses to make their datasets publicly available, unleashing innovative, trusted, and inclusive AI.

Pragmatically Sorting Raw Data Sets

A significant challenge that federal agencies face is that data is stored across numerous federal agencies resulting in data silos. The mandate to migrate siloed federal data to a single location is impractical, demanding the consumption of vast amounts of resources to satisfy such a requirement.

To confront this challenge head-on, we recommend that agencies do not view siloed data as a barrier, but rather as a unique opportunity dealt with through a pragmatic lens of Secure Federated Machine Learning (SFML). In Figure 1 below, enormous effort goes into migrating data from existing locations to a single location. In Figure 2, SFML brings processing mechanisms to the data source for training and inferencing versus requiring that agencies migrate data to one place.

Figure 1

Figure 2

Such data federation ensures privacy and security of both data and machine learning models, offering additional confidence in post-deployment insights. SFML ensures that (i) data remains in place and compute moves to the data, and (ii) compute and data are protected at the hardware level from confidentiality and data integrity attacks.

Numerous SFML studies have illustrated its effectiveness, allowing for faster deployment & testing of smarter models, lower latency, and less power consumption. Furthermore, SFML employs a combination of privacy-by-design techniques to ensure data de-identification, data protection, and insights security. As the commercial sector plans to leverage the use federal data for algorithmic expression(s), the business interest of those participants must be protected through security techniques ingrained at the lowest level of hardware – silicon!

Avoiding Replication of Biases

The paradox of data is that while data reflects societal issues, societal issues can influence data collected, possessing inherent racial, gender, economic, regional, and other types of biases. To date, nearly 200 human biases have been defined and classified, any one of which can affect how we make decisions. As such, there is a probability that the consumption, processing, and sharing of such data will lead to discriminatory insights that under-represent the macrocosm, which could erode trust between humans and machines that learn. Such bias could particularly be amplified in SFML where learning occurs on the data node rather than the combined dataset.

Constructing datasets while optimizing AI algorithms to avoid bias are interdependent functions. Continued investment by federal agencies in fostering access to large and reliable datasets is essential to the development and deployment of innovative, trusted, and inclusive AI. Specifically, greater diversity in datasets will reduce the risk of unintended bias. Diversity in the teams working to curate and then release datasets for AI can also address training bias. A diverse, inclusive team of individuals with different skills and approaches to the curation and release of datasets ensures more holistic and ethical designs of AI algorithms.

Ethical Liberation of Datasets

One growing technology trend is the increase in mechanisms for data collection and creation. Personal data is not just collected from individuals who provide it for particular uses, but also observed and gathered by sensors in connected devices, and derived or created through further automated processing. In fact, the percentage of data coming directly from individuals is decreasing compared to the information that is collected in our increasingly connected society and inferred through machine learning technologies.

While ethical data processing is built on privacy, the increasing amount of data collected, processed and inferred in the artificial intelligence space, strong encryption and de-identification (full anonymization) techniques serve the purpose of protecting individuals’ privacy while achieving higher levels of security. Nonetheless, achieving de-identification will require increasingly complex practices because re-identification will be increasingly possible in deep learning-driven environments. Differential privacy (DP) techniques have emerged in the last years as viable solutions to minimize privacy risks, adding “noise” to scramble personal data. Furthermore, in the academic and research community, Homomorphic encryption (HE) seems particularly promising as it allows computation on encrypted data, enabling AI tasks without the need to transfer personal information.

Thus, along with government investment in the development of international standards for algorithmic explainability and promoting diversity in datasets, increased investment, research & development, and the transparency of outcome-based studies about de-identification techniques like DP and HE, is in U.S. national interest to further drive innovations in the marketplace. Finally, in practice, data de-identification techniques should be uniformly applied across agencies to ensure that (i) data is anonymized and (ii) combining anonymized data across silos does not result into the re-identification of subjects.

Standards and Frameworks Implementation

Data represents a complex field, since data is subject to a variety of regulatory regimes, which include privacy, data sovereignty, localization, and cross-border transfers. This type of subject-matter variance cultivates disharmony in common regulatory approaches and stifles the development of regulations that rely on voluntary standards in evolving technical requirements for AI data use cases.

Access to large, reliable datasets is essential to the development and deployment of robust and trusted AI. Standards and guidelines play an important role in developing approaches for access to AI datasets, but need to be carefully defined based on different use case contexts and consider common ethical concerns, including privacy regulations. Data-related standards, such as metadata and format interoperability standards, can:

make available public sources of information in structured and accessible databases
create reliable datasets (while employing data de-identification techniques) for use by all AI developers to test automated solutions and benchmark program(s) quality, and;
foster incentives for data sharing between the public and private sector and among industry players

In similar areas highly affected by privacy regulations, international standards have been successful in defining useful mechanisms to support a harmonized approach to regulatory objectives. Examples of such standards include the Do Not Track standard (under W3C) and anonymous signatures and authentication standards (under ISO/IEC JTC 1 SC 27)

Re-imagining Data

There is a tremendous amount of data generated daily that must be stored, secured, and organized. More importantly, the value that all of this data represents is nearly immeasurable; value that comes from analysis and the resulting insights. As the U.S. Federal Government embraces data as a strategic asset, society may experience the next great business opportunity, societal advancement, or scientific discovery.

The post Re- imagining Public Data to Advance AI: Maximizing Insights from Federal Data appeared first on Policy@Intel.

Re- imagining Public Data to Advance AI: Maximizing Insights from Federal Data

Pragmatically Sorting Raw Data Sets

Avoiding Replication of Biases

Ethical Liberation of Datasets

Standards and Frameworks Implementation

Re-imagining Data

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112