Runlog

Runlog

Runlog beta

Main

◈ Projects

⊞ Compare

◉ Chat

★ Plans

⬡ Workspace

? Docs

Account

⚿ API Tokens

⚙ Settings

◎ About

→ Logout

◁ Collapse

—

—

Set password

or

Login

Register

Forgot Password

Beta is full

All 25 beta spots are taken.
Join the waitlist — we'll email you when a spot opens.

Dashboard

Your Projects

Each project holds multiple training runs.

Loading projects…

Project

—

Runs

No runs yet. Start training with Runlog.

Subscription

Choose your Plan

Upgrade anytime. Downgrade anytime.

Monthly

Annual

Loading plans…

Need something tailored?

Request a custom plan with limits that fit your exact needs.

View payment history

Collaboration

Your Workspace

Invite teammates to collaborate on projects.

Workspaces

＋ Create

✉ Invitations

🕓 History

Loading workspaces…

Project

Runs Table

Documentation

RunLogger Docs

Sections

Installation

Quick Start

API Reference

Offline Mode

PyTorch

HuggingFace

Keras

XGBoost

Artifacts

System Stats

Collaboration

Plans

Terminal Capture

Manual Sync

Auto Names

Tags & Notes

Error Handling

FAQ

Installation

pip install runlog-sdk

Or install from source:

pip install git+https://github.com/runlog-in/runlog-sdk.git

Quick Start

from runlogger import RunLogger

logger = RunLogger(
    base_url      = "https://runlog.in",
    project_name  = "my-project",              # created automatically if missing
    api_token     = "rl-gb-...",               # Dashboard → API Tokens
    run_name      = "run-1",                   # optional — auto-generated if omitted
    config        = {"model": "gpt2", "params": "125M"},
    tags          = ["baseline", "v1"],
    offline_mode  = True,                      # preserve data if connection drops
)

for step in range(1000):
    loss = train_one_step()

    logger.log(step=step, total_steps=1000, loss=loss, lr=scheduler.get_lr())

    if step % 100 == 0:
        val_loss = evaluate()
        logger.log_eval(step=step, val_loss=val_loss, is_best=val_loss < best)

    if logger.should_pause():
        save_checkpoint(step)
        logger.finish("paused")
        break

logger.finish()

API Reference

RunLogger()

Creates a new run and connects to the dashboard.

base_urlstrDashboard URL e.g. https://runlog.in

api_tokenstrYour API token from Dashboard → API Tokens

project_namestrProject name — auto-created if missing

run_namestrName for this run — auto-generated if not provided (e.g. cosmic-nebula-42)

configdictHyperparameters and metadata. Visible on the run page.

start_stepintStep to start from. Use when resuming from a checkpoint. Default: 0.

tagslistRun tags e.g. ["baseline", "fp16", "v2"]

notesstrFree-text description of the run.

log_system_statsboolAuto-attach GPU/CPU/RAM stats to every log() call. Default: True.

offline_modeboolPreserve data locally if connection is unavailable. Syncs automatically on reconnect. Default: True. Requires a supported plan.

capture_terminalboolCapture stdout/stderr and stream terminal output to the dashboard alongside metrics. Default: True.

verboseboolPrint internal debug info — packet counts, sync intervals, orphan recovery detail. Default: False.

metricslistOptional list of metric names you plan to log. Metrics are tracked automatically — this is rarely needed. Default: [].

logger = RunLogger(
    base_url         = "https://runlog.in",
    api_token        = "rl-gb-...",
    project_name     = "llm-pretraining",
    run_name         = "gpt2-run-3",
    config           = {"params": "125M", "batch_size": 32, "max_steps": 50000},
    start_step       = 5000,
    tags             = ["fp16", "warmup-cosine"],
    notes            = "Resume from best checkpoint, new LR schedule",
    log_system_stats = True,
    offline_mode     = True,
)

logger.log(step, **kwargs)

Log training metrics at the current step. Pass any keyword arguments — each becomes a chart on the dashboard. total_steps enables the progress bar. Buffering and rate limiting are handled automatically.

logger.log(
    step           = step,
    total_steps    = total_steps,
    train_loss     = loss.item(),
    lr             = scheduler.get_last_lr()[0],
    tokens_per_sec = tokens_per_sec,
    total_tokens   = step * batch_size * seq_len,
    eta_seconds    = (total_steps - step) * step_time,
)

logger.log_eval(step, **kwargs)

Log evaluation metrics. Tracked separately from training metrics on the dashboard. Pass is_best=True to flag the current best checkpoint.

logger.log_eval(
    step             = step,
    val_loss         = val_loss,
    ppl              = math.exp(val_loss),
    accuracy         = accuracy,
    is_best          = is_best,
    checkpoint_saved = is_best,
    checkpoint_path  = "checkpoints/best.pt" if is_best else None,
)

logger.log_artifact(path, name, type, metadata=None)

Upload a file artifact attached to the run. Artifacts appear in the run's Artifacts panel. Supported types: model | dataset | image | file.

pathstrLocal path to the file.

namestrDisplay name on the dashboard e.g. "best-model".

typestrmodel | dataset | image | file

metadatadictOptional key-value info e.g. {"val_loss": 0.42, "step": 5000}.

logger.log_artifact("checkpoints/best.pt",
                    name="best-model", type="model",
                    metadata={"val_loss": 0.42, "step": 5000})

logger.log_artifact("data/train.csv",
                    name="training-data", type="dataset",
                    metadata={"rows": 50000})

logger.log_artifact("outputs/confusion_matrix.png",
                    name="confusion-matrix", type="image")

logger.should_pause()

Returns True if a pause was triggered from the dashboard. Call once per step — the flag clears automatically after being read.

if logger.should_pause():
    save_checkpoint(step)
    logger.finish("paused")
    sys.exit(0)

logger.finish(status)

Mark the run as done. Always call this at the end of your script. Status options: completed | crashed | paused. Waits up to 10 seconds for any pending data before closing.

Context Manager

Automatically calls finish("completed") on normal exit and finish("crashed") if an exception is raised.

with RunLogger(...) as logger:
    for step in range(steps):
        logger.log(step=step, loss=loss)

Offline Mode

RunLogger's offline mode is designed for real-world training conditions where connections are unreliable. Enable it once — everything else is automatic.

logger = RunLogger(
    ...,
    offline_mode = True,   # default
)

What it does

When offline_mode=True:

Mid-run disconnectTraining continues uninterrupted. All data is preserved locally and synced automatically when the connection is restored — in order, with no gaps.

Start offlineYou can start training with no connection at all. Everything is buffered locally and uploaded on the next successful connection.

Crashed or killed runsIf your process is killed mid-training, all data logged before the crash is preserved. The next time you start a run from the same directory, it is recovered and synced automatically — no manual steps.

Plan limit mid-runIf your daily log limit is reached, data that could not be uploaded is held locally. It is automatically uploaded the next day when your limit resets.

Terminal logsAll terminal output is captured and streamed to the dashboard in real time. If offline, chunks are stored locally and flushed on reconnect.

Plan requirement

Offline mode requires a supported plan. If your plan does not include it, it is disabled automatically at startup with a warning. If your plan is upgraded mid-run, offline mode activates immediately — no restart needed.

When to use

Long training runsoffline_mode=True

Unstable or intermittent networkoffline_mode=True

Short scripts, stable connectionEither

No local disk writes allowedoffline_mode=False

PyTorch

logger = RunLogger(
    base_url     = "https://runlog.in",
    api_token    = "rl-gb-...",
    project_name = "my-project",
    run_name     = "pytorch-run",
    config       = {"arch": "gpt2", "batch_size": batch_size},
    offline_mode = True,
)

try:
    for step in range(total_steps):
        loss = criterion(model(x), y)
        loss.backward()
        optimizer.step()
        scheduler.step()

        logger.log(
            step           = step,
            total_steps    = total_steps,
            train_loss     = loss.item(),
            lr             = scheduler.get_last_lr()[0],
            tokens_per_sec = batch_size * seq_len / step_time,
        )

        if step % eval_every == 0:
            val_loss = evaluate(model, val_loader)
            is_best  = val_loss < best_loss
            if is_best:
                torch.save(model.state_dict(), "best.pt")
            logger.log_eval(step=step, val_loss=val_loss, is_best=is_best,
                            checkpoint_path="best.pt" if is_best else None)

        if logger.should_pause():
            torch.save(model.state_dict(), f"pause_{step}.pt")
            logger.finish("paused")
            break

    logger.finish("completed")
except Exception:
    logger.finish("crashed")
    raise

For multi-GPU / DDP training, log only from rank 0:

if rank == 0:
    logger.log(step=step, loss=loss)

HuggingFace Trainer

from runlogger import RunLogger
from transformers import TrainerCallback

class RunLoggerCallback(TrainerCallback):
    def __init__(self, logger):
        self.logger = logger

    def on_log(self, args, state, control, logs=None, **kwargs):
        if logs:
            self.logger.log(step=state.global_step,
                            total_steps=state.max_steps, **logs)

    def on_evaluate(self, args, state, control, metrics=None, **kwargs):
        if metrics:
            self.logger.log_eval(step=state.global_step, **metrics)

    def on_train_end(self, args, state, control, **kwargs):
        self.logger.finish()

# usage:
logger  = RunLogger(..., offline_mode=True)
trainer = Trainer(..., callbacks=[RunLoggerCallback(logger)])

Keras / TensorFlow

import tensorflow as tf
from runlogger import RunLogger

class RunLoggerCallback(tf.keras.callbacks.Callback):
    def __init__(self, logger, total_epochs):
        self.logger       = logger
        self.total_epochs = total_epochs

    def on_epoch_end(self, epoch, logs=None):
        self.logger.log(step=epoch, total_steps=self.total_epochs, **(logs or {}))

    def on_train_end(self, logs=None):
        self.logger.finish()

# usage:
logger = RunLogger(..., offline_mode=True)
model.fit(X, y, epochs=50, callbacks=[RunLoggerCallback(logger, total_epochs=50)])

XGBoost

import xgboost as xgb
from runlogger import RunLogger

class RunLoggerXGBCallback(xgb.callback.TrainingCallback):
    def __init__(self, logger, total_rounds):
        self.logger       = logger
        self.total_rounds = total_rounds

    def after_iteration(self, model, epoch, evals_log):
        metrics = {}
        for data, metric_dict in evals_log.items():
            for name, vals in metric_dict.items():
                metrics[f"{data}_{name}"] = vals[-1]
        self.logger.log(step=epoch, total_steps=self.total_rounds, **metrics)
        return False

# usage:
logger = RunLogger(..., offline_mode=True)
bst    = xgb.train(params, dtrain, num_boost_round=100,
                   evals=[(dval, "val")],
                   callbacks=[RunLoggerXGBCallback(logger, 100)])

Artifacts

Log any file as an artifact — models, datasets, plots, configs. Artifacts appear in the run's Artifacts panel and stay associated with the run permanently.

modelModel weights or checkpoints (.pt, .pkl, .onnx, …)

datasetTraining or evaluation data files (.csv, .jsonl, …)

imagePlots, confusion matrices, sample outputs

fileConfigs, logs, or any other file

# model checkpoint
logger.log_artifact("checkpoints/best.pt",
                    name="best-model", type="model",
                    metadata={"val_loss": 0.42, "step": 5000})

# dataset
logger.log_artifact("data/train.csv",
                    name="training-data", type="dataset",
                    metadata={"rows": 50000, "source": "FineWeb"})

# evaluation plot
logger.log_artifact("outputs/confusion_matrix.png",
                    name="confusion-matrix", type="image")

Automatic System Stats

When optional packages are installed, RunLogger automatically appends hardware metrics to every log() call. These appear as charts alongside your training metrics — no extra code needed.

gpu_utilpynvmlGPU utilization (%)

gpu_mem_usedpynvmlGPU memory used (MB)

gpu_mem_totalpynvmlTotal GPU memory (MB)

cpu_utilpsutilCPU utilization (%)

ram_usedpsutilRAM used (MB)

ram_totalpsutilTotal system RAM (MB)

# install optional dependencies
pip install pynvml psutil

# disable if not needed
logger = RunLogger(..., log_system_stats=False)

Stats are collected from GPU 0. If no GPU is present only CPU/RAM metrics are logged. If neither package is installed, system stats are silently skipped.

Collaboration

Pro and Elite plans support team workspaces. Create a workspace, invite teammates by email, and share projects across your organization.

For team workspaces, go to Workspace in the sidebar. Roles:

adminManage members, all projects

memberCreate/edit projects, view all

viewerRead only

Plans

Plans and limits are managed from the dashboard's Plans page. Upgrade or downgrade at any time — changes take effect immediately, even mid-run.

Daily log limitRunLogger warns you when reached and resumes automatically the next day.

Max metrics trackedMetric keys beyond your plan's limit are ignored.

Log rateData is accepted at the rate your plan allows. The most recent value always gets through.

Offline modeAvailable on supported plans. Activates and deactivates automatically with plan changes.

Team workspacesAvailable on Pro and Elite plans.

Terminal Capture

When capture_terminal=True (the default), RunLogger intercepts all stdout and stderr output from your training script and streams it to the dashboard in real time alongside your metrics. No extra code needed — print() statements, tqdm progress bars, and framework logs all appear automatically.

logger = RunLogger(
        ...,
        capture_terminal = True,   # default — streams all print() output to dashboard
    )

Offline behaviour

If the connection drops mid-run, terminal chunks are stored locally and flushed to the dashboard on reconnect — in order, with no gaps. This requires offline_mode=True.

Disable if needed

logger = RunLogger(
        ...,
        capture_terminal = False,  # raw stdout only, nothing sent to dashboard
    )

Disable if your script produces extremely high-frequency output that you don't need on the dashboard, or if you're running in an environment where stdout redirection is not allowed.

Manual Sync CLI

If a run was interrupted and you want to sync its locally buffered data without starting a new run, use the runlogger-sync command:

# scan the default dumps/ directory
    runlogger-sync

    # scan a specific directory
    runlogger-sync --dir /path/to/runs

    # sync one specific file
    runlogger-sync --file dumps/.runlog_abc123.db

    # show full debug output
    runlogger-sync --verbose
    runlogger-sync -v

Options

--dirDirectory to scan for offline DB files. Default: dumps/

--fileSync a single specific DB file directly.

--base-urlServer URL — fallback if not stored in the DB. Can also be set via RUNLOGGER_URL.

--tokenAPI token — fallback if not stored in the DB. Can also be set via RUNLOGGER_TOKEN.

--verbose, -vShow full debug detail: run IDs, per-batch info, log uploads.

Environment variables

export RUNLOGGER_URL=https://runlog.in
    export RUNLOGGER_TOKEN=rl-...
    runlogger-sync

Notes

The token and server URL are stored inside each DB file, so you usually don't need to pass them manually. Safe to run multiple times — already-synced packets are skipped automatically. Unrecoverable DB files (missing token or payload) are discarded silently.

Auto Run Names

If you don't provide a run_name, one is generated automatically in the format adjective-noun-number:

cosmic-nebula-42
    silver-ridge-317
    eager-summit-5

Names are readable, memorable, and unique at any practical project scale. You'll see them on the dashboard and in logs. To use a fixed name instead:

logger = RunLogger(
        ...,
        run_name = "gpt2-baseline-run3",
    )

Tags & Notes

Tags

Tags appear on the dashboard and can be used to filter and group runs across a project. Pass any list of strings.

logger = RunLogger(
        ...,
        tags = ["baseline", "bf16", "fineweb", "v2"],
    )

Notes

Free-text notes visible on the run detail page. Useful for recording what you're testing in this run.

logger = RunLogger(
        ...,
        notes = "Testing SwiGLU vs GELU — same LR schedule, different FFN.",
    )

Error Handling

Errors raised at startup

These are raised immediately as RuntimeError before training begins:

RuntimeError: Invalid API token: rl-...
    RuntimeError: [Runlog] account is banned.

Everything else degrades gracefully

Connection lost mid-runData is preserved locally if offline_mode=True, retried automatically on reconnect.

Upload failureLogged to console. Training continues unaffected.

Plan limit reachedRunLogger warns you and stops logging for the rest of the day. Resets at midnight.

Recommended pattern

try:
        with RunLogger(...) as logger:
            for step in range(max_steps):
                loss = train()
                logger.log(step=step, loss=loss)
    except RuntimeError as e:
        print(f"RunLogger error: {e}")
        # continue training without logging, or exit

Verbose mode

Pass verbose=True to see internal detail — packet counts, sync intervals, orphan run recovery. Useful for diagnosing connection or sync issues.

logger = RunLogger(..., verbose=True)

FAQ

Do I need to call finish() if I use the context manager?

No — it is called automatically. Normal exit calls finish("completed"). An exception calls finish("crashed"). The exception is not suppressed.

What if I forget finish()?

The run stays marked as running on the dashboard indefinitely. Always call finish() or use the context manager.

Can I use RunLogger with multi-GPU / DDP training?

Yes. Log only from rank 0 to avoid duplicate data:

if rank == 0:
        logger.log(step=step, loss=loss)

Can I log string values as metrics?

No — metric values must be int, float, or bool. Pass strings in config, tags, or notes instead.

Can I have multiple loggers in one script?

Yes. Each RunLogger instance is independent and creates its own run.

Does RunLogger affect training performance?

No. All logging is non-blocking — your training loop is never slowed down.

What if my machine is killed mid-training?

If offline_mode=True, all data logged before the crash is preserved and recovered automatically the next time you start a run from the same directory. No manual steps required.

Where are offline DB files stored?

In a dumps/ directory relative to where your training script runs. Files are named .runlog_<run_id>.db and cleaned up automatically after a successful sync.

How do I debug connection or sync issues?

Pass verbose=True to RunLogger(...) to see full internal detail, or use runlogger-sync --verbose for manual sync debugging.

Can I use RunLogger with a self-hosted Runlog instance?

RunLogger is designed exclusively for use with runlog.in. Self-hosted deployments are not supported. Set base_url to https://runlog.in.

Runlog

The training monitor — lightweight, self-hosted, beautiful.

⚡

Real-time Streaming

Live metric updates. Watch your loss curve move as training happens.

◈

Multi-run Compare

Overlay train and val loss across runs on a single chart. Spot the best experiment instantly.

⬡

Team Workspaces

Invite teammates, assign roles, and share projects across your organization.

⚿

API Token Auth

Per-project tokens let you log from any machine — Colab, cloud, local — securely.

★

Checkpoint Tracking

Automatically flags best checkpoints and logs artifact paths alongside your metrics.

◬

Metric Alerts

Set alerts for loss plateaus, threshold crossings, and more. Never miss a crashed run.

◎

Dynamic Charts

Auto-detected from whatever you log. Drag to reorder. Smooth with a slider.

⊞

Runs Table

Filter and sort all runs by status, tag, or loss. Built for large experiment histories.

◉

System Stats

Automatic CPU utilization and RAM tracking logged alongside your metrics every step.

⊘

Offline Mode

Log runs without a connection. Metrics are queued locally and synced when you're back online.

⌨

Terminal Capture

Mirrors stdout and stderr into your run log. Every print and warning saved automatically.

Get in touch

📧 runlog.uk@gmail.com

Terms of Service | Privacy Policy

Runlog

© 2026 Runlog (runlog.in) All rights reserved.

⌄

Selected Runs

Active Metrics

Compare

Run Comparison

Select runs, then pick which metrics to overlay. Hover the chart for a unified crosshair tooltip.

Select Runs

Run

—

running

step 0 / ?

loss — lr — eta — tok/s —

—

Smoothing 0%

Run Details

Tags

Notes

Share

Not public

Checkpoints

No checkpoints yet.

Artifacts

No artifacts logged yet.

Account

Your Settings

Manage your profile, appearance, and preferences.

Font Size

SmallMediumLarge

13px

Appearance

Dark

Light

Timezone

Display timezone

Affects how timestamps are displayed throughout the dashboard.

Account

Your API Tokens

Click a project to manage its tokens.

Account Tokens cross-project

These tokens can access multiple projects. Choose scope below.

Label

Project Access

Workspace

Team Chat

Real-time messaging across your workspaces.

Workspaces

—

—

Select a workspace to start chatting.