💠

understanding LLMs

This course is not a science lesson. It will only cover topics you need to know to meaningfully extract value out of AI systems. It will not cover the mathematics or what is needed to be an AI engineer. Leave that to the nerds.

But you should know the key concepts.

A large language model or LLM is just an AI system that’s been trained on enormous amounts of test. Billions of words from books, websites and more.

They learn how language works by spotting patterns, allowing them to understand and create text that sounds human.

The models do not ‘think’ like us - but they are phenomenally good at guessing what comes next.

The ‘thinking model’s’ you hear of, are basically the same technology wrapped up into a series of orchestrated steps. The models will take your query. Seek to understand and expand what it means, and prompt chain over and over again - to get a better read on what’s being asked. These models simulate how humans would ‘reason’ through a problem. It’s not human thinking, however it does deliver much more powerful responses.

It’s good to state clearly what these models can do:

  • They can answer questions
  • They can write essays, poems or stories
  • Translate languages
  • Summarise or expand text
  • Write code

They are not flawless on their own. In fact the less constrained and less specific you are, the more likely they’ll make mistakes. That does not however mean they are not incredibly powerful.

Now here is something that is often overlooked.

Large Language Models can Write Code.

They can access all of mathematics.

They can recite real world algoriths. They can write software that can access APIs.

Generally Large Language Models are NOT good at mathematical calculations as a rule. They are large language models. They are however phenomenal at writing software to implement real calculations with human instruction.

That means, YOU - armed with an LLM - could potentially solve exceptionally complex mathematical and scientific problems.

Most people don’t know this because they don’t actually understand the technology. This is the real power of AI. Never before in history have humans had so much power. They have the sum power of human knowledge, and they have the ability to massage it with code.

Think of an AI model like a Rubric Cube. And think of the ability to code, as your hands.

Independently the Rubric Cube is not that valuable. But combine them, and it’s a super power.

Defining a token - the building block of AI

Isn’t it interesting, that we called the unit of measure of AI, a monetary name. The Token.

I don’t think that’s an accident.

An LLM is like a word chef. It chops text into bite sized pieces called Tokens. Tokens are the basic units of measure for AI consumption. The rough measure of a token is about 3/4 of the length of a common word:

  • A whole word like ‘dog’ might be a token.
  • Running, may be 2 tokens - “run’ and ‘ning’.
  • Punctuartion items like “?” or “,” might constitute a structural token.

And tokens cost money.

Tokens are important for the following reasons:

  • Tokens are processed token by token, so it takes time to return them. You may want to consider selecting models with a good level of performance in terms of speed. E.g. Grok is MUCH faster than Anthropic for Example. I think Google may be even faster.
  • There are limits to the amount of tokens that can be consumed in a single transaction and this disparity can be absolutely massive.
  • The free version of Microsoft CoPilot was a limit of 8000 tokens per prompt at one stage. And that includes the input prompt tokens, the calculation tokens and the returned output tokens. Whereas some of Googles Models are now delivering 2M tokens, and there is talk of 200M token models coming that will cost considerably more for mass data ingestion use cases.
  • Most large language model technology is practically delivered on cloud infrastructure by third parties. It is possible to build large language model on local computers, but they tend to be ‘mini’ LLM models that are a lot less powerful. Running substantial models locally requires massive and expensive levels of RAM and GPU capabilities.

So let’s explore what LLMs are bad at:

  • Feel emotions:
    • LLMs are acutlaly really good at faking empathy and talking about psychology and feelings. In fact it’s one of the main uses for the technology in society today, but they don’t really feel anything.
  • Being 100% accurate: and there are a few things to consider here:
    • Many LLM’s have information cut off dates.
    • This is being limited now, by allowing LLMs to search the internet.
    • If you ask some models who the US president is, many will not be able to name Donald Trump. They’ll tell you Biden was president but their information is cut off as of X date.
  • They wont necessarily be able to invent new things:
    • Large language models will remix what they’ve seen, and not create from scratch.
    • Having said this, the comment is quite nebulous, because you can get LLMs to solve new calculations or problems and iterate to new things - so this is a stretch claim.
  • Solve complex problems:
    • Multi-step logic or math can trip them up - (on their own) - but you can use AI to build software that’ll solve problems.

Whole legions of writers will harp on the above points. You need to almost ignore the noise.

The two things you need to focus on, is speed, coding, and the ability to direct AI to complete discrete tasks at an acceptable error rate.

Yes AI makes mistakes, but the statistical importance of that fact is far far lower than you think.

Humans make mistakes too.

The best way to think about it is through the lens of significant figures. In science we can only reliably notate a number to a certain level of decimal points based on the observed accuracy of the instrument. Once AI models for a given task can accurately do something - at a meaningfully important level - not being 100% correct really won’t matter.

E.g. Vehicle autonomy.

A robot car could crash. But if the rate of accidents is 17 better than a human, and the extent of damage in an accident is less - it’s only a matter of time that robotic cars take over from humans. It’ll be a scientific imperative.

Conclusion

Many of the LLMs launched a few years ago were toys. They had small context windows. No connectivity to the internet and they hallucinated a lot.

The newer models are a whole different ball game. Unfortunately, a lot of the ‘truisms’ or facts are ambiguous.

You’re only takeaway here is. Can AI solve problems? Can it write code to achieve things? Can it do that with a level of precision that leaves humans for dead? And can the outputs be reliably measured for error over time? If the answer to all those questions are yes - despite any problems AI will rapidly advance.