본문 바로가기

Research Log/Tracing

(9)

Offcore Response Event Event number 0B7H support offcore response monitoring using an associated configuration MSR, MSR_OFFCORE_RSP0 (address 1A6H) in conjunction with umask value 01H or MSR_OFFCORE_RSP1 (address 1A7H) in conjunction with umask value 02H. Table 18-14 lists the event code, mask value and additional off-core configuration MSR that must be programmed to count off-core response events using IA32_PMCx. The..

Intel PMU There are a finite number of performance event select MSRs (IA32_PERFEVTSELx MSRs). The result of a performance monitoring event is reported in a performance monitoring counter (IA32_PMCx MSR). 두 register는 아래의 사항을 따른다. IA32_PERFEVTSELx의 bit field layout은 모든 microarchitecture들이 동일하다. IA32_PERFEVTSELx MSRs의 주소와 IA32_PMC MSRs의 주소는 모든 microarchitecture들이 동일하다. 모든 logical processor는 IA32_PERFEVTSELx ..

Advanced profiling topics. PEBS and LBR. multiplexing and scaling events If there are more events than counters, the kernel uses time multiplexing(switch frequency = HZ, generally 100 or 1000) to give each event a chance to access the monitoring hardware. Multiplexing only applies to PMU events. Multiplexing을 사용하면 event 들이 매번 측정되지 않는다. 실행의 마지막에 tool이 따로 계산을 수행한다. final_count = raw_count * time_enabled/time_running. 그러므로 workload에 따라 bl..

The PMCs of EC2: Measuring IPC ... as with the increasing scale of processors and speed of storage devices, the common bottleneck is moving from disks to the memory subsystem. CPU caches, the MMU, memory busses, and CPU interconnects. These can only be analyzed with PMCs. PMC Usage PMC는 Counting, Sampling 두개의 방법으로 사용될 수 있다. Counting: 발생하는 개수를 집계 Sampling: 이벤트의 개수를 기반으로 인터럽트를 발생시킴 (Interrupt는 stack trace, PC(Program Counter) s..

Perf Events Counter, MSRs - model specific registers https://easyperf.net/blog/2018/06/01/PMU-counters-and-profiling-basics Counting vs. Sampling Counting - disable counting - set all the counters to 0 - configure evenst that we want to measure - enable counting - run the application - disable counting - read the values of the counters Sampling - set counter to 0 - enable counting - wait for the overflow an..

CPU cycle에 대한 고찰 What is retired instruction? Modern processors execute much more instructions that the program flow needs. This is called a speculative execution. Instructions that were “proven” as indeed needed by the program execution flow are “retired”. What is reference cycle? Having a snippet A to run in 100 core clocks and a snippet B in 200 core clocks means that B is slower in general (it takes double t..

BTF, CO-RE Brendan Greeg's Blog [link] BTF: BPF Type Format, which provides struct information to avoid needing Clang and kernel headers. CO-RE: BPF Compile-Once Run-Everywhere, which allows compiled BPF bytecode to be relocatable, avoiding the need for recompilation by LLVM. PingCAP Article [link] BCC 단점 BCC(BPF Compiler Collection) toolkilt은 효과적인 kernel tracing을 지원하기 위해 만들어졌지만 여러 단점이 있다. BCC는 LLVM 이나 Cla..

PCI dmidecode 명령어로 H/W 정보 많이 알 수 있음 (sudo 권한 필요) Scanning /dev/mem for entry point. 로 시작함 - OEM-specific Type (?) - BIOS setting - System Information - Base Board Information - Chassis Information - Processor Information - Cache Information (L1, L2, L3) - Port Connector Information - System Slot Information ... Memory Device Array Handle: 0x003E Error Information Handle: Not Provided Total Width: 64..

[17 SOSP] Canopy: An End-to-End Performance Tracing And Analysis System Evaluation Canopy는 Facebook의 production 환경에서 지난 2년 동안 배포 및 사용되어 왔다. 이번 장에서는 Facebook 엔지니어가 성능 문제를 진단하기 위해 Canopy가 어떻게 사용되어 왔는지를 보여준다. Canopy의 오버헤드와 load-shedding 속성들을 평가한뒤 2.2장에서 설명한 챌린지들을 해결하기 위한 방법을 보여준다. Canopy는 - 서로 다른 동작을 하는 여러 이기종 시스템에서 신속한 성능 진단 및 성능 모델을 만들 수 있게 한다. - 많은 사용자가 동시에 서로 다른 목적의 용례를 위한 커스터마이징을 가능하게 한다. - 새로운 용례 및 실행 조건에 맞게 독립적으로 trace model을 개선할 수 있게 한다. - 적은 오버헤드로 많은 수의 trace를..

이전 1 다음

티스토리툴바