Be a Responsible AI Researcher

AI is reshaping society as we speak, offering vast potential for positive impact in many areas. Yet, there’s also the risk of misuse and unintended consequences. Admittedly, ethical debates often occur in response to issues rather than preemptively. It should not be the case.

AI Ethics is a large and complex space. It is important to not to overlook that Ethical AI presents a collective action dilemma, it is not just policy makers, governments, corporations, and scientists. We can’t be bystanders, we all shall actively chip in shaping our near and long term future in light of fast AI advancement.

Perils of ignoring AI Ethics

Here I primarily talk about AI researchers’ stance on ethical considerations surrounding deep learning and AI. Modern AI systems are based on the best different branches of science have to offer. They are very complex systems for us commoners to reason about. Indeed, the rest of the world, 99.999%, is at the mercy of the ingenuity of AI scientists and engineers.

In the age of corporate arms-race for LLMs land grab, where hundreds of billion dollars are poured in, it is all too easy to fall into all kinds of temptations (e.g. to earn big bucks, gain celebrity status etc.) or, quite opposite, succumb to hopeless feeling of personal irrelevance in the midst of AI revolution. In the end, not having a critical eye on our inactions or wrong actions can bring humanity to its demise.

You want to be a responsible AI researcher. Celebrity status is just a bonus

The AI revolution, unlike all previous technological revolutions (Gutenberg’s press, electricity, internet), massively and easily penetrates all aspects of human life, all over the planet and very fast. Did most of us really know anything or care about AI before the end of 2022?

Something that is particularly acute this time around and not so obvious compared to previous technological revolutions:

AI scientists, researchers and engineers should contemplate the moral implications of their work before they start it. While not every ethical issue falls under the control of individual computer scientists, researchers still bear responsibility to consider and address potential misuse of the systems they develop. Not as an afterthought or someone else’s job.

At the same time celebrated scientific AI achievements should absolutely be a subject to social and philosophical scrutiny due to an unprecedented potential impact on humanity. This inevitably creates tensions in the society between ardent AI advancement supporters and the naysayers. But these frictions are OK in the name of the prosperous future of fair humanity.

Responsible AI research

AI students and professionals might believe that their work is too distant from reality or just a small cog in a larger machine, and thus their actions couldn’t have a significant impact. However, this assumption is misguided. Researchers frequently have the power to choose:

the projects they dedicate their time to,
the companies or institutions they align with,
the knowledge they pursue,
the social and intellectual circles they engage in, and communicate with the rest of the world.

Acting in accordance with ethical principles, whatever they may entail, often resembles a social dilemma where the most favorable outcomes depend on cooperation, even though it may not seem advantageous for any individual to cooperate:

Responsible AI research presents a collective action problem.

Scientific temptations

Every day we are under the spell of the barrage of announcements from AI labs about yet another better-bigger-faster LLM. Misinformation about their practical potency tends to propagate faster and endure longer than truth across various social networks. Therefore, it’s crucial to refrain from exaggerating the capabilities of AI systems and to avoid misleading anthropomorphism. Particularly when AI is compared to human’s cognitive abilities. For example LLM scoring 90% in MCAT should not indicate anywhere close to potentially being equivalent to a Medical Doctor candidate.

Additionally, being cognizant of the potential misapplication of machine learning techniques is essential. Modern AI researcher shall operate not by number of LLM parameters, FLOPs, inference speed considerations but be mindful of harmful consequences, intentional or not, due to flawed design and application of AI stemming from:

bias
un-interpretability
weaponization
privacy violations
environmental impact
fraud
etc.

Value alignment

In crafting AI systems, our aim is to ensure their “values” or objectives are congruent with those of humanity, a challenge often termed the value alignment problem. This challenge is multifaceted:

defining our values comprehensively and accurately proves difficult,
encoding these values as objectives within an AI model poses challenges,
guaranteeing the model learns to fulfill these objectives presents further complexities.

Defining our values in itself is extremely complex. How do you take into consideration cultural and demographic specificities? There is no such thing as one-size-fits-all that happily addresses everyone on our diverse planet.

Encoding our values is further posing challenges even if somehow we agreed on them. Without going much into details, technically AI tries to optimize mathematical loss function to indirectly ensure meeting value alignment. But if this loss function is not carefully crafted (e.g. wrong reward criteria, training data, algorithm implementation) then in its overzealous optimization effort AI will diverge from value alignment.

This is one of the examples where scientists should think about the consequences of uncareful choices.

Bias

When we talk about bias in society we refer to the inclination or prejudice towards a particular perspective, idea, or group, often resulting in unfair treatment or judgment. In a strictly scientific context, bias indicates a statistical deviation from an established norm.

In the realm of AI, bias becomes problematic when it arises from inappropriate factors that influence outcomes. There are several ways how biases creep into Large Language Models that scientists and researchers need to be careful about:

Value alignment – as we alluded to in the previous chapter about value alignment, deciding on a model’s objectives necessitates making value-based judgments about our priorities. This can lead to biases. Additionally, if we do not effectively translate these decisions into practice (encoding values) and the value alignment does not accurately reflect our intended objectives, biases may further amplify.

Quality of data – LLMs are trained on massive diverse data. The model is as good as the quality of data. Algorithmic bias may arise when the dataset is not representative or lacks completeness. Even if the training data are comprehensive and accurately represent the society in which they were generated, biases may persist if that society is inherently biased against marginalized communities. So presumably AI equity that is supposed to make the world better actually will further perpetuate the social problem.

Fairness – Making a decision on the mathematical definition to assess model fairness involves a subjective assessment of values. In reality there could be multiple intuitive definitions for fairness, but they may be logically irreconcilable to be expressed mathematically. This necessitates augmentation of solely mathematical conceptualizations of fairness with a more comprehensive evaluation of whether algorithms contribute to justice in real-world applications. RLHF (Reinforcement Learning Human Feedback) is part of today’s solution for it. Alas it is questionably scalable or cost-effective.

Dependencies – modern LLMs are composites of several models (Mixture-of-Experts is one of the variations). Interactions between multiple models and/or AI agents that have not gone through universal, cohesive un-biasing (i.e. they used different training algorithms and different datasets of varying quality for training of each expert model). This could worsen biases and create new societal damage.

Certainly, efforts can be made to guarantee that data are varied, inclusive, and thorough. However, when the society producing the training data is inherently biased against marginalized groups, biases may still manifest even with entirely precise datasets. Thus given the potential for algorithmic bias and the underrepresentation in training datasets mentioned earlier, it’s crucial to also examine how error rates in the outputs of these systems may further intensify discrimination against marginalized communities.

If historical and social contexts are fragmentally represented in training data with no nuanced connections between them, the LLMs could further solidify and perpetuate structures of power and oppression, societal stereotypes and injustices, historical periods with inhuman practices.

Moral Judgment of AI systems

Researchers and scientists need to bear in mind the moral judgment capabilities of AI systems they work on before incorporating them into society.

As emergent capabilities of LLMs resembling reasoning increase, so do our expectations of them, once they are wrapped as autonomous AI systems, to be capable of making moral judgments with necessary built-in safeguards.

There are numerous decision contexts that don’t have actions that bear moral significance. For instance, selecting the next move in a game of Go no one is going to die. However, in other scenarios, actions can indeed carry moral weight. This is evident in decision-making for autonomous vehicles, lethal autonomous weapons systems, and all kinds of patient care systems. With increasing autonomy, these systems may necessitate making moral judgments devoid of direct human intervention.

Hence there are different progressively complex levels of moral judgment that one would expect from autonomous AI system to adhere to before introducing it to the real-world:

that is aware of impact to society,
with inherent safe-guards to prevent adverse impact,
sticking to moral principles and ethical conduct, and ultimately
having full autonomy and awareness of its actions

You need a whole new suite of behavioral benchmarks for that. Which are absent. They are extremely tough to develop. Today’s RLHF efforts significantly pale compared to what is actually needed.

Transparency and interpretability

We talked above why AI systems value alignment is so important when assessing their potential impact on society. Without transparency or interpretability, an information gap emerges between the user and the AI system, complicating efforts to ensure alignment with values.

A sophisticated computational system achieves transparency when all its operational intricacies are fully disclosed. Interpretability, on the other hand, refers to the system’s capacity for humans to comprehend its decision-making process.

Transparency entails understanding of:

how AI algorithms function. Deep neural networks comprise one extremely complex nonlinear approximation function that can have hundreds of layers, hundreds of thousands neurons with hundreds of billions of parameters continuously mapping inputs and outputs. This function is extremely difficult to reason about,
how the AI system executes the algorithms at runtime.

Because there are so many dependencies on quality of data, makeup for compute, and the nature of prompts it is hard to truly assess the transparency of LLMs. Given that deep neural networks can comprise billions of parameters, it is practically impossible to comprehend their functioning solely through examination.

It’s uncertain if we can develop intricate decision-making systems that are completely understandable to both users and creators. Additionally, ongoing discussions revolve around the definitions of understandability and interpretability within such systems, with no concrete consensus yet established.

It is worth mentioning the efforts spearheaded by Chris Olah and Max Tegmark in the area of Mechanistic Interpretability.

This concludes the first article in the series of AI Impact. To be continued with more exciting posts. Stay tuned…

It is always good idea to refresh some foundations in Why