Cache memory in detail and hit ratio June 2021

In the computer architecture or computer organization (COA), cache memory is a very fast memory, which makes sure the data reach from the main memory to the CPU faster. We will discuss in detail the Cache hit ratio and types of cache memories in OS and The role of cache between CPU and main memory in computer organization and architecture.

The cache is nothing but a buffer between CPU and RAM. Parameters like Cache hit, Cache miss, Miss rate, Miss Penalty, Cache block, Cache line, and Cache tag, we will understand all of them to increase the cache performance.

We will discuss in detail the cache memory and its working, various types of cache memories, what is cache mapping in memories, and how does locality of reference helps cache to manage its high speed.

We all know there is a very big bridge gap between memory and the CPU. Typically in the current generation, the CPU is faster than the memory, especially comparing with the main memory(RAM). A solution is to have the speed of the main memory(RAM) equals the speed of the CPU so the there is no room for waiting for the CPU to access the data or instructions from the main memory.

So there is a need for a method or scheme by which the CPU can take the data and instruction from the main memory in the fastest possible way. That purpose is filled in by a Cache Memory in the operating system which does it in a better way as expected.

Let us look into the detail, what is a cache, how it works, and what are main issues with proper block diagrams.

Table of Contents

Cache memory in detail

Let’s not complicate much here, few points to know about the cache memories are,

1. Cache is a type of memory in the computer, which is much faster compared to RAM (main memory), and Secondary memory (hard disk, hard drives).

2. Cache helps the CPU in speeding up the data access process from the main memory RAM by holding the most frequently requested data from the CPU and makes the data handy to the CPU when needed. Which reduces much of the average time.

3. Cache acts as a bridge or buffer between RAM and Secondary Memory.

4. Cache is not faster than Registers (Level 1 memory) in the CPU and also cheaper than registers but costlier than RAM and Secondary memories. Because of this reason, cache will be smaller than RAM size in the computer.

Cache can be in two forms blocks and lines (typically 8 or 256).

Let us consider an example If a cache memory is of 256 blocks or lines which are of 32 words each. What is the total cache size? What is the main memory RAM size if it has a 24-bit address?

The total cache size can be given as 256×32 = 8192 that is the cache size is 8k.

In the case of the main memory RAM, if it is a 24bit address memory then it will have 2^24 = 16M words.

Therefore the size of the main memory can be calculated as 16M/32 = 512K is the RAM size.

Refer to the below block diagram in section 3 which tells half of the story here and justifies the above points.

Cache Performance (Cache Hit and Cache Miss)

The data and instructions flow like this, A CPU processor would be needing the data or instructions which might be anywhere in the computer such as in registers, cache, RAM, and worst case in Secondary memory. First, the CPU looks into registers, then cache, and then in the RAM is the sequence.

Cache Hit

Now suppose the CPU wants to read data/instruction or write data into the main memory RAM, the CPU will first check the entry for that particular data in the cache. If the CPU finds the entry in the cache it is a cache hit. A cache hit corresponds to an operation when the data is read or written into a cache location.

Cache Miss

If the CPU was not able to find the entry for that particular data in the cache location then it is a cache miss. A cache miss corresponds to an operation that does create a new entry for the data in the cache location and the CPU brings the data from the main memory(RAM) and paste it into the new memory location entry created in the cache. Then the CPU reads or writes data from the cache by a cache hit.

Cache Hit ratio (Performance)

The performance of the cache is determined by a factor called a cache hit ratio or hit rate. The cache hit ratio is frequently being measured to identify the performance of cache. The cache hit ratio is measured by the ratio of the number of cache hits to the sum of a total number of cache hit and cache miss. or several caches hits upon several accesses.

Cache hit ratio = Number of cache hit / (No. of cache hit + No. of cache miss)

The Cache performance can be improved by taking care of the following points,

1. A cache block with a higher size must be used

2. A cache block with higher associativity

3. Reduce the cache miss rate

4. Reduce the cache miss penalty, and

5. Reduce the time-to-hit in the cache.

Types of Cache Memories

Cache can be categorized into two kinds,

Primary of Level 1 Cache

The is a small and fast cache which primarily placed on the processor chip, similar to registers in the CPU.

Secondary or Level 2 Cache

This cache is relatively bigger and slower than the primary L1 cache mostly placed in between the CPU and main memory RAM.

The locality of reference in memory

The concept or the mechanism of the cache is very effective because of the idea of the locality of reference.

Generally for any program, it is analyzed that out of the total execution of the program it spends most of it in running instructions that are routine and executing repeatedly or frequently. Based on this we can say some part of the memory is getting used very frequently and the left part is very infrequent. So various program’s localized area is executing repeatedly for some duration, called the locality of reference.

The concept is we store that locally referenced data which is more frequent in the cache which is pretty faster compared to RAM so that the CPU access them easily and faster. There can be two kinds of the locality of reference spatial and temporal locality of reference.

Spatial type of locality of reference

This is the condition where there is the probability of a program element being close to the locality of the reference point. And if we search the location same program element again the probability of this element being close to the locality of reference point increases.

This kind of program element will be put into Cache. In a typical case example, we know that a page contains a block of words. Now by use of spatial rule if we put a word in the proximity and word next to this would be placed in Register. This creates a problem when a page fault occurs. so it is always suggested to load a complete page table.

Temporal type of locality of reference

Here an algorithm is used to have the least recently used program elements at one place unlike the spatial type of locality of reference. Mostly these temporal program elements will be put in the main memory (RAM).

Memory Architecture and Different Levels

Level 1 Memory

Level 1 Memory will be the Registers of a CPU which are the fastest and costliest. Typical examples would be accumulator, address registers, and the program counter (PC).

Level 2 Memory

Level 2 Memory will be the cache that we are discussing or write now. Within them, there can be level 1 cache and level 2 cache memories.

Level 3 Memory

Level 3 Memory will be the primary memory or main memory. A RAM (Random Access Memory) is nominally faster and cheap compared to Registers and Cache is put under Level 3 memory.

Level 4 Memory

Level 4 Memory will be the secondary memory, A hard disk is an example. This memory is the cheapest and the slowest among all levels of memories.

Cache Memory Mapping Algorithms

There are three kinds of cache algorithms available,

1. Direct Mapping Algorithm
2. Associative Mapping Algorithm
3. N-Way or Set Associative Mapping Algorithm

Direct Mapping Cache Memory Algorithm

In the direct cache memory mapping algorithm, each of the RAM (main memory) blocks can be placed in one of the blocks of cache.

Direct Mapping Function = RAM blocks % Number of cache blocks

Address of direct memory mapping

TAG – BLOCK – WORD

Here, TAG of cache helps us to decide the mapping of the block of main memory to the cache block. The tag can be calculated as the ratio of the total number of blocks in the main memory RAM to the total number of blocks in the cache.

TAG = (Main memory blocks)/(Cache memory blocks)

Let us take an example, Cache has 256 blocks and the main memory is 512K.

Tag = 512K/256 = (2^19)/(2^8) = 2^11, there fore 11 bits are reserved for TAG in the cache memory address.

If the memory address is 24bit then 11 bits of tag, 8 bits of blocks, and 5bits of the word.

Then, Mapping Memory blocks with Cache blocks

0 – 0
1 – 1
2 – 2
255 – 255
256 – 0
257 – 1

Here we cannot have both blocks at the same time leads to contention problems.

Disadvantages of Direct cache memory mapping

1. Contention problem which is explained in the above example.
2. A new block of main memory will always replace the old block in cache. If these two are the blocks that are to be used most frequently then the performance of the cache falls.

Associative Mapping Cache Memory Algorithm

Here, the algorithm is in such a way that any block of the main memory can be mapped into any block of the cache.

Address of associative mapping cache memory algorithm.

TAG-WORD

Let us take an example, Cache has 256 blocks and the main memory is 512K.

The advantage here is that it gives much freedom for block mapping there is a large room for replacement algorithms.

The disadvantage is that a 19bit TAG bit is to be compared with all Tag registers of cache blocks. Hence need for associative memory to store these Tag values is costly and will be of less scalability.

N-Way or Set Associative Mapping Cache Memory Algorithm

This a balanced cache memory algorithm that is a selective combination of both direct mapping and associative mapping cache algorithms.

N number of choices denotes how many parts of the memory is associative mapping capable and the remaining will be the direct mapping.

For example, N = 1 indicates the algorithm is completely direct mapping and N = 2 or 4 or 8, etc. for N- way set associative.

A typical intel i7 core processor has an 8-way set associative mapping algorithm.

Address of N-way set-associative cache memory mapping algorithm.

TAG-SET-WORD

Let us take an example, assume 4-way set associative mapping, and Cache has 256 blocks and main memory is of 512K.

Number of sets in cache = Number of cache blocks / N-way set = 256/4 = 64 sets

TAG = (Number of main memory blocks) / (Number of sets in cache) = 512K/64 = 2^13 [13bits]

Advantages of N-way set-associative cache memory mapping algorithm.

1. The TAG fields of each SET are being compared with the block of main memory to be associatively mapped with the SET of cache.

2. Instead of using a large single associative memory as in the associative mapping, here we are making use of small sets of associative memories.

Conclusion

In the above topics, we discussed the cache and main memory by considering the word addressing. The is a byte addressing cache also, which is similar in concept but needs to understand the differences. I suggest going through byte addressing in cache once.

We have understood the advantages and disadvantages of all three cache mapping algorithms. The only thing solving cache problem is also a must to understand cache completely. I will try to bring up some cache problems with the solution and attach a link to this post later.

Happy Learning.