What is AI really or Is it ChatGPT? GPT is just a Generative Pre-trained Transformer – General Prompt Model. It is a model based on learning from big chunk of data that is collected from all around the Internet.
Text to Text
Mostly of the AI services are text based – ChatGPT based. What they are actually just text to text with input text and output text.
- there is version ChatGPT 3.5,
- there is version 4
- and a lot more and there are multiple variations
- different input parameters with data up to 2019
- and the newer models with more recent data,
All these models to be trained – they need to have the data collected from the internet and they store it. They do the math transformations on this data and extrapolate – what is called machine learning. The result is some transformation of the data into smaller size, but in such a way that it will create, that will add possibility to ask something and it will return some information based on the models created from the transformations of these mathematical operations that were executed on the data
Machine Learning Variables
The variables are
- how old, how specific is the data
- the other variable is what are the mathematical transformations that are executed on the data
I was not very attentive in university and I didn’t memorize very well all the mathematical courses, so I cannot talk much about on the mathematical side. but on the other side, the big corporations actually are so much valuable because they collect all this data.
- Facebook,
- Google,
- even now Tesla is valuable as much as it is because they have collected so much data from all the cars that are on the road,
yeah, I went too far on this topic that is not related to the subsection.
Data Source
The first point is text-to-text and depending on the database, you will get so much better output because of the original information. There is for example the co-pilot that has learned on github source code and there is the standard ChatGPT that has learned programming from all the articles around the internet. What of those two models you think will do better on code? Of course, the co-pilot that has learned, that has used the source code from github as a learning model.
Age of Data
You know all the code that you will get from both models will depend on the source code that the machine learning algorithm has trained on. All the code that I’ve received trying to ask the GPTs to receive code, was old.
- it gives a deprecated spring boot code,
- it returns deprecated Flutter packages that are out of date, that haven’t updated to sound null safety.
All the colleagues, all the developers around the world do not sleep and one cannot get up to speed with them all. So neither will the artificial intelligence could get up to speed with all the developments. I’ve experienced this first hand.
Text to Speech
Text-to-speech is something that OpenAI has worked on and they give text-to-speech generation API. It is also called – TTS. All the operating systems have some text-to-speech software, so it will be easier for blind or vision impaired individuals to actually read, to listen what an application or a website is doing.
I really enjoy one software that is read aloud. It reads books very clear as like human. The sound is like a computer, but it’s good enough, or I could understand what the book is about.
Speech to Text
I also tried speech-to-text in university. At that time there was software that was called Sphinx. and I’m not sure how it developed, but all the big corporations actually developed internally their own speech-to-text.
- Apple have Siri.
- There is the Google Assistant that I tried to hook with prompts that Google Assistant has. The frase that I worked with is “take a note” and made it to be handled by my app – if you have it installed. But, because Google developers develop too fast, Android and beyond.
- Microsoft has Cortana, but they’ve discontinued it.
Right now the assistants are modeled based on artificial intelligence algorithms. All the work that I’ve done is deprecated, because the new models, new APIs, the new integrations change too fast. New developments are now using AI, so they probably have in their database so much identical audio and text that is binded to the audio, so they probably have employed very optimized algorithms that maps any new incoming audio and transform it to text. This has an open-source version, the whisper open AI project. I will probably check this out, because it is free and I could run it on my own machine.
They have improved about the speech-to-text part, and I’ve tried it also in Bulgarian and it works.
Text to Image
The next thing that OpenAI that is doing, that is somehow AI, is text-to-image. it’s the Dali project. Initially Google tried something similar – because they have a big database of images that are marked with
- this cat,
- this is dog,
- that is human,
- that is whatever,
Similar database probably is owned by open AI, so the machine could have something to learn from.
In my personal experience it does not execute it perfectly well. Maybe it is the prompts that I’ve tried, but it is recognizable that they are AI generated.
Image to Text
Another point is image-to-text, also called object-character recognition. I’ve worked a little bit with https://github.com/tesseract-ocr/tesseract that I’ve integrated in my notes up. Open AI have something similar. But, their implementation also thies to guess what is in an image – beyond text.
Video
Videos are just a series of images that are displayed super-fast – one after another – and this gives an impression as a video. What is challenging here – that was challenging before the improvements in codecs was – to actually encode the video in a smaller format with keeping high quality
So if AI could produce images from text – it may also produce a video.
AI in the Real World
The big issue of AI in Reality is – Physicallity of the machine – so it could move around the world.
AGI – Artificial general intelligence
In my opinion – the big challenge of AI is to actually live in the real world. The big issue of AI is the reality that we live in. It’s physical and quimical – very much beyond digital.
Moving AI
An AI could make a robot move around the world. Even Elon Musk is creating a humanoid robot, and I think Boston Dynamics or some other companies was trying to master a robot to create a robot that moves like human.
The core problem is the machine needs to master gravity.
In the cars or in trucks the machine is with wheels, and it is somehow easy to move around. Iit just needs to recognize whether it’s the road and to go on it. If it goes out of it – it’s better for human to take over.
AI Phisicality
With a humanoid robot – it needs to understand what is soft, what is hard, where is the center of the weight of himself, of itself. This will be like a baby that is trying to walk, so it will take a lot of time to actually learn it. The normal human individual that takes several years to learn to walk, to be similar to parents, and because we humans, we are very specialized and we are reproducing in millions, we grow in millions. No matter what rase you are – we are all common biology.
Robot Shapes
But the robots – they are actually created with different hardware, with different materials and different shapes and sizes, and one small variation in material and weight of the material and in size of the robot will make the moving algorithm, the moving AI knowledge of a robot not so working well.
My opinion on general intelligence, artificial general intelligence, is it may come if all the above obstacles are somehow mastered by some AI system, mastered and combined into one.
Robot Self-Update
Will an AI could self update? It is probably possible for software. Java has class reloading at runtime and also projects similar to OSGI. Linux has introduced – Linux Live Patching. But:
- What about failing software?
- What about not having enough RAM, ROM, CPU?
- Will a machine be able learn to produce electricity and power itself?
Will Artificial Intelligence Understand?
Artificial General Intelligence needs to:
- understand text, and the core part is understand, so it needs to
- recognize an image, so it needs to recognize video,
- it needs to understand what is coming to recognize an object, and with this object it needs to understand what is the physicality of it.
- is it harmful,
- is it friendly,
- or is something that the robot could go through, like some sandstorm, or is it another track that it will not go through.
It needs to understand language with a speech text and text to speak, and to give back answers, and to recognize if the speaking party in front of it is actually harmful to him or not. Is the individual in front of him having a war-waging attitude? They are probably specialized systems that recognize what is the attitude of the human, so what about other robots that do not have an expression like we do?
Artificial General Intelligence need to combine all this and merge them into one software, one hardware, and all of the specialized softwares for all of these super super-maths. They need to be merged into something that actually could move around the world.