Skip to content

Data Flow

This document describes how energy data moves through the A-LEMS system, from hardware sensors to the database.


๐Ÿ”„ Data Flow Overview

A-LEMS captures energy data at 100Hz and processes it through a 5-stage pipeline:

Detailed Flow:

Step Component Function
1 Hardware RAPL counters, MSR registers, perf events
2 Readers RAPLReader, MSRReader, PerfReader, etc.
3 EnergyEngine 100Hz sampling, synchronization
4 Samples Raw energy samples with timestamps
5 Database Persistent storage in SQLite

Data Flow Diagram


๐Ÿ“ก Stage 1: Hardware Sources

A-LEMS reads from multiple hardware sources simultaneously:

Source Location Data Provided
RAPL /sys/class/powercap/intel-rapl Package, core, uncore, dram energy (ยตJ)
MSR /dev/cpu/*/msr C-state counters, ring bus frequency
Perf perf_event_open Instructions, cycles, cache misses
Turbostat turbostat subprocess CPU frequency, C-state %, temperature
Thermal /sys/class/thermal Thermal zone temperatures
Scheduler /proc/stat, /proc/loadavg Context switches, interrupts

๐Ÿ”ง Stage 2: Hardware Readers

Each hardware source has a dedicated reader class:

Hardware Readers

All hardware readers follow a common interface pattern:

Reader Metrics Frequency Description
RAPLReader package, core, uncore, dram energy (ยตJ) 100Hz Intel Running Average Power Limit counters
MSRReader C-state counters, ring bus freq Snapshots Model-specific registers for CPU power states
PerfReader instructions, cycles, cache misses Process-attached Linux perf events for performance counters
TurbostatReader CPU freq, C-state %, package temp 10Hz Intel turbostat utility wrapper
SensorReader Thermal zone temperatures 1Hz System thermal sensors
SchedulerMonitor Context switches, interrupts 10Hz Linux scheduler metrics

Key Methods

All readers implement:

  • read() โ†’ Returns current measurements as dictionary
  • calibrate() (optional) - One-time calibration
  • get_metadata() (optional) - Reader information

Each reader provides: - On-demand reads for start/stop snapshots - Continuous sampling for high-frequency data


โš™๏ธ Stage 3: EnergyEngine

The EnergyEngine orchestrates all readers with perfect synchronization:

Sampling Thread

def _sampling_loop(self):
    """100Hz sampling thread"""
    interval = 1.0 / self.sampling_rate_hz
    sample_counter = 0

    while self._sampling_active:
        now = time.time()
        energy = self.rapl.read_energy_safe()
        self._sampling_queue.put((now, energy))

        # Sample interrupts every 10th iteration (10Hz)
        if sample_counter % 10 == 0:
            self.scheduler.sample_interrupts()

        sample_counter += 1
        time.sleep(interval)
Data Collected Per Sample
| Time | Package (ยตJ) | Core (ยตJ) | Uncore (ยตJ) | DRAM (ยตJ) |
|------|--------------|-----------|-------------|-----------|
| tโ‚ | 1,234,567 | 890,123 | 234,567 | 78,901 |
| tโ‚‚ | 1,235,678 | 891,234 | 235,678 | 79,012 |
| tโ‚ƒ | 1,236,789 | 892,345 | 236,789 | 79,123 |

๐Ÿ“Š Stage 4: Sample Processing

Samples are processed through the 3-layer data model:

### Layer 1: RawEnergyMeasurement (Immutable)

```python
{
    'measurement_id': 'meas_12345',
    'start_time': 1734567890.123,
    'end_time': 1734567900.456,
    'rapl_start_uj': {'package-0': 1234567, 'core': 890123},
    'rapl_end_uj': {'package-0': 1334567, 'core': 990123},
    'samples': [(tโ‚, energyโ‚), (tโ‚‚, energyโ‚‚), ...],
    'perf_data': {...},
    'turbostat_data': {...}
}``` 
### Layer 2: BaselineMeasurement
``` python
{
    'baseline_id': 'baseline_12345',
    'power_watts': {'package-0': 2.3, 'core': 1.1},
    'duration_seconds': 10,
    'sample_count': 1000,
    'std_dev_watts': {'package-0': 0.1, 'core': 0.05}
}``` 
### Layer 3: DerivedEnergyMeasurement

```python
{
    'workload_energy_uj': 100000,      # package - idle
    'reasoning_energy_uj': 60000,      # core - idle_core
    'orchestration_tax_uj': 40000,     # workload - reasoning
    'ipc': 2.5,
    'cache_miss_rate': 0.03
}``` 


๐Ÿ’พ Stage 5: Database Storage
Data is stored across 10+ tables for complete lineage:

### Database Schema Overview

The database consists of 11 tables with the following relationships:

#### Core Tables

| Table | Primary Key | Foreign Keys | Description |
|-------|-------------|--------------|-------------|
| `experiments` | `exp_id` | - | Experiment metadata |
| `runs` | `run_id` | `exp_id`, `hw_id`, `baseline_id` | Core run data (80+ columns) |
| `hardware_config` | `hw_id` | - | Hardware fingerprints |
| `environment_config` | `env_id` | - | Software environment |

#### High-Frequency Sample Tables

| Table | Frequency | Foreign Key | Description |
|-------|-----------|-------------|-------------|
| `energy_samples` | 100Hz | `run_id` | RAPL energy samples |
| `cpu_samples` | 10Hz | `run_id` | CPU frequency, C-state residency |
| `interrupt_samples` | 10Hz | `run_id` | Interrupt rates |
| `thermal_samples` | 1Hz | `run_id` | Temperature samples |

#### Orchestration Tables

| Table | Primary Key | Foreign Keys | Description |
|-------|-------------|--------------|-------------|
| `orchestration_events` | `event_id` | `run_id` | Agent step tracking |
| `orchestration_tax_summary` | `comparison_id` | `linear_run_id`, `agentic_run_id` | Per-pair tax calculations |
| `llm_interactions` | `interaction_id` | `run_id` | LLM prompts and responses |

Relationship Diagram

experiments โ”€โ”€โ”€โ”€โ” โ–ผ hardware_config โ”€โ”€โ–บ runs โ—„โ”€โ”€ environment_config โ”‚ โ”œโ”€โ”€โ–บ energy_samples โ”œโ”€โ”€โ–บ cpu_samples โ”œโ”€โ”€โ–บ interrupt_samples โ”œโ”€โ”€โ–บ thermal_samples โ”œโ”€โ”€โ–บ orchestration_events โ””โ”€โ”€โ–บ llm_interactions

orchestration_events โ”€โ”€โ–บ orchestration_tax_summary

text

#### Key Relationships

- One `experiment` has many `runs`
- One `run` has many samples in all sample tables
- One `run` has many `orchestration_events`
- Two `runs` (linear + agentic) form one `orchestration_tax_summary`


Sample Storage Rates
Table   Sampling Rate   Typical Rows per Run
energy_samples  100 Hz  100-1000
cpu_samples 10 Hz   10-100
interrupt_samples   10 Hz   10-100
thermal_samples 1 Hz    1-10
๐Ÿ”„ End-to-End Data Journey
Linear Workflow Example
text
### Linear vs Agentic Workflow Examples

#### Linear Workflow Timeline

| Step | Duration | Details |
|------|----------|---------|
| **Start** | `tโ‚€ = 1734567890.123` | Measurement begins |
| **LLM Call** | 850ms API + 150ms compute | Single LLM request |
| **End** | `tโ‚ = 1734567891.123` | Measurement ends (ฮ” = 1.000s) |
| **Sampling** | 100 samples @ 10ms intervals | High-frequency energy data |
| **Derivation** | - | Workload = 1.2J, Baseline = 0.3J, Dynamic = 0.9J |
| **Storage** | - | 1 run record, 100 energy samples, 10 CPU samples, 10 interrupt samples |

#### Agentic Workflow Timeline

| Phase | Duration | Description |
|-------|----------|-------------|
| **Planning** | 0.3s | LLM call #1 - Creates execution plan |
| **Tool Call** | 0.1s | Calculator execution |
| **Reasoning** | 0.4s | LLM call #2 - Interprets results |
| **Tool Call** | 0.5s | Web search |
| **Synthesis** | 0.2s | LLM call #3 - Combines results into final answer |
| **Total** | 1.5s | Cumulative execution time |

#### Storage Comparison

| Metric | Linear | Agentic |
|--------|--------|---------|
| Run Records | 1 | 1 |
| Energy Samples | 100 | 150 |
| CPU Samples | 10 | 15 |
| Interrupt Samples | 10 | 15 |
| Orchestration Events | 0 | 5 |

Key Differences

  • Agentic workflows involve multiple LLM calls and tool executions
  • Orchestration events track each phase for tax calculation
  • Higher sample counts due to longer execution time
  • More complex data enables orchestration tax analysis ```

๐Ÿ“ˆ Data Flow by Sampling Rate Rate Data Type Purpose 100Hz Energy samples Precise energy curves, transient analysis 10Hz CPU metrics Frequency scaling, C-state transitions 10Hz Interrupts I/O pressure, scheduler activity 1Hz Temperature Thermal trends, cooling analysis Per-run Aggregates Statistical analysis, ML features ๐Ÿ” Debugging Data Flow

Check Sample Counts

sql SELECT r.run_id, r.workflow_type, (SELECT COUNT(*) FROM energy_samples WHERE run_id = r.run_id) as energy, (SELECT COUNT(*) FROM cpu_samples WHERE run_id = r.run_id) as cpu, (SELECT COUNT(*) FROM interrupt_samples WHERE run_id = r.run_id) as irq FROM runs r WHERE r.exp_id = (SELECT MAX(exp_id) FROM experiments);

Check for Duplicate Timestamps

sql SELECT timestamp_ns, COUNT(*) as count, GROUP_CONCAT(run_id) as runs FROM energy_samples GROUP BY timestamp_ns HAVING COUNT(*) > 1;

๐ŸŽฏ Key Data Flow Principles

Immutable Raw Data - Never modify original measurements

Timestamp Precision - Nanosecond accuracy for correlation

Complete Lineage - Every derived value traces to raw data

Separation of Concerns - Collection, processing, storage are distinct

Reproducibility - Same inputs produce same outputs

This data flow document corresponds to the diagram at ../assets/diagrams/data-flow.svg