• Nie Znaleziono Wyników

Computational Performance Memory accesses – cache memory

N/A
N/A
Protected

Academic year: 2021

Share "Computational Performance Memory accesses – cache memory"

Copied!
17
0
0

Pełen tekst

(1)

Analysis and modeling of

Computational Performance

Memory accesses – cache memory

(2)

Memory accesses – virtual memory

Typical memory instruction:

movsd %xmm0, 24(%rsp)

Arguments in:

registers

main memory (primary storage)

• address calculated based on the values stored in registers

Addresses

in virtual address space

• size: 2^n-1, n = 16, 32 or 64

• n – characteristic to processor and operating system

translated to physical addresses

• paging – the most popular mechanism

• segments, uniform, ...

(3)

Paged virtual memory

Paged virtual memory

virtual address space of the process divided into (virtual) pages

• page size variable, usually 4 kb

• physical memory divided into frames (physical pages) of the same size as virtual memory pages

• each virtual address divided into page number and address within page (example: 32-bit address)

• virtual addresses are translated into physical addresses

(4)

Paged virtual memory

Paged virtual memory

address translation:

• each page associated with the entry in the page table

• page table specifies whether the page has assigned physical memory frame with its data

if yes, page table contains physical address of virtual page memory

(5)

Paged virtual memory

Paging (page swapping)

virtual address space usually larger than physical memory

• only fraction of all pages from virtual address space in physical memory

• some pages may be stored in secondary storage (HDD,

• operating system manages SDD) the storage of pages

when accessed page not in physical DRAM memory – page fault

» minor page fault – virtual page has not yet assigned physical page – possibly small overhead (no HDD or SDD access)

» major page fault – virtual page moved to secondary storage – page swap between secondary and primary storage large

overhead

(6)

Paged virtual memory

Page table

page table can be large – millions of pages (e.g. 2^(n-12), n=32,64)

• stored in primary memory and possibly in secondary memory

to speed up access to page table it has a separate cache – TLB

• TLB – translation lookaside buffer

• within core

• up to several hundred entries

there may be many

page tables managed by the operating system

(for different processes,

segments, etc.)

(7)

Memory access

Instruction with virtual address

address translation

• TLB look-up

TLB hit -> physical page address with data

TLB miss -> page walk – the search in memory

» entry found, physical page assigned -> TLB update, memory access retry, TLB hit, physical page address with data

» entry found, no physical page assigned -> page fault (both minor and major faults lead to page table and TLB update)

physical memory access

• caches, cache coherence protocol, DRAM

some processors perform concurrently (at least partially) address translation and L1 cache access

Thrashing

frequent page swaps due to many major page faults

must be avoided (any major page faults should be avoided)

(8)

Memory wall

Typical memory instruction:

movsd %xmm0, 24(%rsp)

Arguments:

registers

• operating at the speed of processor core

main memory (primary storage)

• slow DRAM modules

(9)

Memory wall

Typical loop:

.L5:

addsd 0(%rbp,%rdx), %xmm7 addq $8, %rdx

cmpq $80000000, %rdx movsd %xmm7, (%rsp) jne .L5

in each iteration:

• time for arithmetic operations: several cycles

• time for DRAM memory accesses: hundreds of cycles

Two mechanisms for latency hiding:

cache memories

prefetching (hardware and software)

• e.g. for the loop above

addsd 0(%rbp,%rdx), %xmm7 [prefetch data for next iteration]

• critical hardware ability to process many memory requests concurrently

(10)

Memory hierarchy

Solution to "memory wall" -> memory hierarchy

(11)

Cache hierarchy

Cache levels: L1, L2, L3, (L4?)

Memory access:

memory request

• L1 hit or ...

• L1 miss

L2 hit (L1 cache line replacement) or ...

L2 miss

» etc. ... and finally:

» DRAM access (cache lines replacement)

Cache effectiveness

locality of accesses

• temporal – the same element accessed several times in short period of time

• spatial – several close in memory elements accessed in short period of time

(12)

Cache organization

Cache organization:

cache size

cache line size

cache associativity

(13)

Mapping of memory blocks to cache lines

<- Direct mapped cache

• B elements of array a in one line

• each block in memory has exactly one corresponding cache line

Fully associative cache ->

• B elements of array a in one line

• each block in memory can be stored in any cache line

(14)

4-way set associative cache

Cache line – B variables

First B elements of array a in one of 4 lines in a set

The next B elements in one of 4 lines of the next set

etc. , at certain moment one of cache lines must be replaced

(15)

Compulsory misses (aka cold start misses)

First access to a block

Capacity misses

Due to finite cache size

A replaced block is later accessed again

Conflict misses (aka collision misses) - In a non-fully associative cache

Due to competition for entries in a set. Would not occur in a fully associative cache of the same total size

Coherency misses

Due to cache flushes to keep multiple caches coherent in a mutliprocessor

Cache misses

(16)

Cache performance

(17)

Cache organization

Cache organization details:

inclusive vs exclusive

• e.g. victim cache

harvard architecture

cache coherence

pipelining

non-blocking

multi banking

line replacement strategies:

• random

• FIFO

• LRU (least recently used)

• LFU (least frequently used)

• more complex algorithms

Cytaty

Powiązane dokumenty

Sława Awedyk (Poznań) Stig Eliasson (Mainz) Jacek Fisiak (Poznań) Edmund Gussmann (Poznań) Erik Hansen (Copenhagen) Ernst Håkon Jahr (Kristiansand) Gert Kreutzer (Cologne)

In disputes involving rules and procedures under more than one covered agreement, if there is a conflict between special or additional rules and procedures of such agreements

The Ministerial Conference shall establish at its first session a revised list of waivers covered by this provision that adds any waivers granted under GATT 1947 after 15 December

The main character of the book, Bruno Lind, a scientist dealing profession- ally with complicated neurological issues of remembering and memory loss, tries to recollect his

- the general matrix of savings and investment attitudes of households in Poland, which takes into account the most important forms of investing savings in

➢ accesses to the same element must be from the tested memory level – separated by sufficient number of accesses to different elements to evict from levels of memory closer to

First, we proposed a migration process, consisting of five steps: retrieving pages, navigational path extraction, user interface component model identification, single-page

The organizers of the confe- rence – Kielce History Museum, State Archives in Kielce and the Institute of History of an Kochanowski University in Kielce made efforts to present