AI intuition

AI Intuition

I have an ambitious goal to help you gain strong intuition about Artificial Intelligence and particularly the role of Deep Learning that enables the modern day AI magic. All in a course of several bite-size posts. Well, I will be happy if at the least my efforts will help you build a mental roadmap for further deepening your knowledge around vast AI subject. 

AI applications

Let’s start with what, in broad strokes, various AI applications are aimed at:

  • Identifying and generating patterns – expected or novel ones. This falls in Generative AI subcategory: ChatGPT, language translations, sentiment analysis, succinct summaries, speech and visual synthesis, new protein foldings etc – all belong here.
  • Classifying and segmenting observations in complex setups – this falls in Computer Vision.
  • Intelligently guiding robots in an autonomous way while learning new environments in real time. 

How AI achieves objectives

To achieve all the objectives mentioned above AI needs to be able to:

  • Know the world to some extent (as it is applicable to a specific application)
  • Deal with complexity and ambiguity of the world knowledge
  • Make appropriate decisions while facing uncertainty. There are incredibly many plausible choices to choose from.
  • Continuously learn about the world (or primary subject of concern)

Different science branches chip in

Let’s see what it takes to be able to meet those expectations and how different branches of science help solve ensuing AI challenges. 

Uncertainties and complexities of the world

The world is complex. Massive information about it needs to be internalized. Many things are uncertain. On the one hand, uncertainty comes from seemingly endless possibilities that AI has to deal with. On the other hand, AI faces domain complexities that it needs to capture yet they are neither readily explainable nor easily navigable. Even if AI has a clear idea of what it is dealing with at a given moment, the challenge is how do you make a series of optimal decisions leading to intelligent outcomes – completing ideas, making translations, solving math problems etc?

Training

In order to know the world, AI needs to be trained by getting exposed to massive world information. It is infeasible to handcraft the knowledge and somehow code it into AI. The world knowledge is too abundant, complex and constantly changing. Information needs to be processed and turned into internal knowledge automatically, to be readily navigable and actionable.

Knowledge representation

Hence rises necessity to figure out:

  • How to codify knowledge
  • How to store it
  • How to navigate through it

To achieve all these key AI objectives knowledge needs to have rich representation. This means to have essential contextual information compacted within a small footprint as much as possible; in a shape and form that can be easily interpreted, processed and correlated with other bits of pieces of information. Information Theory provides AI the ways for effective compression of complex information and its communication without losing essence of it.

Dealing with uncertainty

Now, for effective knowledge navigation AI has to deal with lots of uncertainties which means:

  • endless choices of related knowledge to choose from, and
  • absence of precise rules that govern a process of making many interrelated choices.

This is where Probability Theory – “how do you arrive to the pool of plausible choices” – along with Decision Theory – “exactly what and how to choose from the pool of choices” – comes into play.

Bringing order to information “chaos” through Deep Learning

In the process of vast knowledge navigation it is important that AI can discern key concepts and reasoning building blocks, invariants, from acceptable deviations or improvisations that make AI seem creative. The building blocks are hierarchical and their combination makes up complex objects and concepts that help AI “reason” (in its special way). In order to achieve that, the science says that AI needs to be able to map huge world knowledge onto much smaller dimensional space to be able to sort out what really matters and what can be improvised, how to effectively navigate in the sea of knowledge. This is what the whole body of Deep Learning achieves by means of Deep Neural Network (DNN).

Deep Neural Network in action

Deep Neural Network continuously transforms and maps the real world knowledge to different spaces in the hopes to compress and map original info to the level that helps AI mathematically discern intricate properties of the knowledge it has acquired that otherwise is not possible to achieve by hand. DNN does it in multistage fashion – hence layers concept in Deep Neural Networks. Each layer (ranging from tens to hundreds) has many neurons (ranging from tens to thousands) that can be considered as micro decision makers. Picture now billions of neurons feverishly collaborating to come to consensus and please your request to complete your ChatGPT prompt. Crazy, isn’t it?

An important part of DNN machinery is seemingly weird knowledge codification in the form of rich representation – embeddings and model parameters. Under the hood they are realized by DNN transformers. In essence this is where Manifold Theory chimes in along with Information and Probability theories. Manifolds are malleable substrates, freeway mazes if you wish, that ensure fluidity of massive information flow and its proper routing to plausible outcomes (e.g. idea completion, sentence translations, sentiment analysis etc).

Putting all together, Deep Neural Network while being non-linear mathematical approximation function secures proper transformation of the information on the go so that its properties, encoded in adequate representations, seamlessly blend with probabilities (uncertainties) of choices all along the flow – from original prompt “what has been asked” to intelligent outcome – “prompt completion with ideas”. 

Ok. I’ll stop here. It was a bit dense. I admit it. What you read to this point is a grand plan, a primer, how Deep Learning enabled AI works. I wanted to convey appreciation of how on a very high level different theories play a substantial role in AI magic manifestation that we witness every day. Let’s see what lies ahead.


Large Language Models

Now, to dissect in more approachable details why AI works the following posts will be revolving around Large Language Models as arguably the most intricate yet accessible and representative paradigm of modern AI glory.

Large Language Models (LLMs) are bedrock of Generative AI.

  • What is it?
  • Why should we trust them?
  • To what extent can we rely on them?
  • When shall we custom tailor LLMs?
  • Train them from scratch? or
  • Fine-tune existing foundational models to our needs? or
  • Outright build our own ones? 

The blogosphere is replete with all kinds of recipes of what LLMs are available out there, how to use and enhance them, how to build your own LLMs and so on. But there is not much material dedicated to why LLMs are so potent and fantastic, border line magical. There are not many sources addressing laymen’s concerns:

  • Why shall they trust LLMs to begin with?
  • What are the scientific underpinnings of LLMs that could be intuitively conveyed to the uninitiated and without an advanced STEM degree?

A mini roadmap

The goal of the posts in the Why track is to help build solid intuition around LLMs. Here is a mini-roadmap that I intend to cover shortly:

Importance of Language

Why Large Language Models – GPT, Lliama, Gemini etc. – and Generative AI as whole stand out from the rest of the AI crowd?

World Knowledge: Compression and Representation

Importance of proper compression for massive information and proper representation in order to effectively infer new intelligent outcomes. We will talk about phenomenal embeddings. How LLM relies heavily on Information Theory to achieve this. 

LLM foundations: Brain, DNA and analogy

Making sense of Large Language Models and Deep Neural Networks magic by drawing loose analogy with human DNA and Brain combo. Similar concepts, different hardware.

Manifolds and representations

Dwelling on how LLMs do reason. Manifold Theory is often overlooked yet absolutely crucial to understand LLM reasoning magic.

LLM Training and Inference

Going deeper into what constitutes LLM Training and Inference.

LLMs are Dynamical systems with Phase Transitions

LLMs emergent abilities rivaling or exceeding humans are due to resemblance to dynamical systems with phase transitions.


Well, I hope to deliver on my promise… 🙂


Comments

Leave a Reply

Your email address will not be published. Required fields are marked *