Hi folks,
As promised in our previous article, today we are going to tell you a thing or two about CPUs.
And we should start by warning you that, for a change, we thought modifying a little the format of our articles, just to see if you’re going to like it or not.
So, inspired by the format model made popular by Penn and Teller on Discovery Channel and based on the fact that all subjects of our articles (all in the IT & C domain and sometimes related history) often contain details so spectacular it would be hard to tell them apart from a credible lie, we decided, for the next few articles to insert a big lie in the content of the article.
It will be just one unique lie (no more than one, we don’t intend to put your nerves to the test) but -because of it- we will obviously have to no longer link our content to the original sources during the series of this format.
OK, so back to our subject, a CPU (Central Processing Unit) is the main microprocessor in charge of processing data within a computer.
This is why in the technical details list of any computer, the first information provided is about the CPU.
What it should process, of course, comes from the sequence of instructions from programs: it could be a spreadsheet, a music player, a video editor or an anti-virus, you name it.
For the CPU it makes no difference at all, it just keeps on executing instructions as they come.
But the way processing is done is extremely complex.
Back to the list of technical details of any computer, there is always strange thing on that list right next to the CPU type, called « clock ».
Basically, the clock is what imposes same rythm to all computer’s internal components, similar to the « tempo » in symphonic music.
Because each component has its own particularities (like musical instruments in the symphonic orchestra) in terms of speed, there is a need to maintain an unique tempo otherwise
everything would mess up.
An important synchronisation aspect is the one between the CPU and the RAM (Random Access Memory): due to their nature, CPU performance speed is much higher than reading/writing from/to the RAM.
And because exchanging data with the « outer-world » is delaying the CPU, it is alo othe reason why another technical detail is usually found on the performance list of a computer, nextto the CPU: its cache-memory size.
Cache-memory is a special kind of memory, embedded in the CPU, which is much more expensive and physically bigger than the RAM but it is also incomparably faster.
In fact, it is so fast it can « play on the same tempo » as the CPU (ie, work at the same clock as the CPU), which RAM cannot.
The bigger the cache-memory, the bigger the overall performances, because each time the CPU loads data from a certain position of the RAM, a component called « memory cache controller » fetches even more data from adjacent positions of the RAM and places them into CPU’s cache.
So when CPU is done doing something and needs to fetch data from « outside » again, it is very probable that the required data is already loaded in its cache, ready to be used at maximum clock speed with no need to waste further time (ie, a certain number of « clock ticks » delays) until getting that data.
Actually, even different internal CPU components might work at different clock rates, but this is too much info already for the scope of this article so let’s just conclude that even if cache controller is not infailable and the data loaded in the cache-memory might be useless for the CPU, it is true however that bigger size of cache-memory means less interruptions needed by the CPU to grab its data from the « outside world » and therefore, higher overall performance.
All in all, the simplest possibile logical scheme of any CPU is quite straightforward at this point: it has a memory cache unit (containing data fetched from the RAM) then an instruction
cache unit (contains instructions) then a fetch unit (grabs the needed instructions to fetch it for execution) then a decode unit then an execution unit and finally data cache unit, where the results of processing are being stored.
The decode unit is figuring-out how to execute a certain instruction and to do that, it looks into the internal ROM (Read Only Memory) of the CPU (each CPU has one) and based on the micro-code it finds there, it knows how any kind of instructions should be executed.
For example, if the instruction is a math addition of « X + Y », it will first require the fetch unit the values of both X and Y and then pass all data (values of X and Y along with « step-by-step microcode guide ») to the execution unit.
Of course, the execution unit finally execute the instruction and results are sent to the data cache.
But there are some interesting tricks CPU designers use for increasing processing speed.
To begin with, modern CPUs have more than one single execution unit, so for example having 8 units working in parallel is theoretically like having 8 CPUs.
This is called superscalar architecture.
Then, each execution unit can have a different specialization (for a particular subset of instructions): for example, a mathematical operations execution unit (to which the X+Y instruction above would be sent) is called a Float Point Unit (FPU), to tell it apart from a « generic » execution unit (called an Arithmetic and Logical Unit, ALU).
Another trick is the « pipeline » and it is based on the sequential character of the units.
For example, after fetch unit have sent an instruction to the decode unit, it could get idle.
To use this « idle » time in a productive manner, the fetch unit has to grab the next instruction instead of « pausing » and sends it to « decode unit » then move further to fetch the next instruction and so on.
This principle applies to the entire chain of CPU units, thus creating a « pipeline » that can be various stages-long.
So a CPU with an « x-stages » pipeline is like actually performing « x » operations simultaneously.
There are other techniques to increase processing power, of course, but detailing them here would make no sense.
So why did we however mentioned few of them in the lines above?
Just to make the point that Moore’s Law we’ve mentioned in our previous article (data processing-power doubling every less than 2 years intervals) is not only due to transistors-squeezing more and more on a same size of the chip: it is also about other techniques.
And why mentioning that ?
Well, because if Moore’s law will keep on proving true, it would mean that in the next 15 or 20 years the transistors-based processor era will come to an end, because the reached size would be of atomic scale.
So the time for a new leap in IT is soon to come: the quantum (super)computers era.
These computers were already theoretized since end-1960’s and started to be more seriously looked into since the early 1980’s.
Instead of the « 0 » and « 1 » binary digits (bits), such supercomputers would work based on « qubits », which are properties of quantum physics particles known as « quakers ».
Heavy governmental and private fundings have been already assigned for research in this domain so the race for the quantum computers is already on.
Well, folks, see you next time when we are also going to tell you where the deliberate lie in the content of this article was!
Bogdan