Logikon

Analytics for LLM Reasoning Traces.

Highlights • Analytics • Examples • Stay tuned • Docs 🚧

Logikon /\/ is a library for analyzing and scoring the quality of plain-text reasoning traces produced by LLMs (or humans). It reveals the argumentative structure of LLM outputs, visualizes reasoning complexity, and evaluates its quality.

Logikon /\/ allows you to automatically supervise the AI agents in your advanced LLM apps. This can be used for debugging and monitoring your AI assistants, or for evaluating the quality of human–AI interaction.

Logikon /\/ is highly customizable and extensible. You can choose from a variety of metrics, artifacts, and evaluation methods, pick an expert LLM for logical analysis, and even build your own metrics on top of Logikon's artifacts.

Warning

Logikon /\/ is currently in early beta. The API is subject to change. Please be patient, and report any issues you encounter.

Installation

pip install git+https://github.com/logikon-ai/logikon@v0.1.0

See examples folder for more details.

Highlights

Analyze and score completions with one extra line of code

# LLM generation
prompt = "Vim or Emacs? Reason carefully before submitting your choice."
completion = llm.predict(prompt)

# Analyze and score reasoning 🚀
import logikon

score = logikon.score(prompt=prompt, completion=completion)

#  >>> print(score.info())
#  argmap_size: 13
#  n_root_nodes: 3
#  global_balance: -.23

Configure metrics, artifacts and evaluation methods

import logikon

# Configure scoring methods
config = logikon.ScoreConfig(
    expert_model = "code-davinci-002",  # expert LLM for logical analysis
    metrics = ["argmap_attack_ratio"],  # ratio of objections
    artifacts = ["svg_argmap"],         # argument map as svg
)

# LLM generation
...

# Debug and score reasoning
score = logikon.score(config=config, prompt=prompt, completion=completion)

Analytics

Argumentation quantity metrics

Score the quantity of arguments in the reasoning trace.

number of arguments
number of central claims
density of argumentation network
mean strength of arguments

🤔 What for?

👉 Detect where your LLM fails to generate (sufficiently many) reasons when deliberating a decision or justifying an answer it has given—which may lead to poor AI decision-making and undermine AI explainability.

Argumentative bias metrics

Score the balance of arguments in the reasoning trace.

mean support/attack bias averaged over all central claims
global support/attack balance
naive pros/cons ratio

🤔 What for?

👉 Detect whether your (recently updated) advanced LLM app suddenly produces biased reasoning—which may indicate flawed reasoning that reduces your app's performance.

Argumentation clarity metrics 🚧

Score the presentation of arguments in the reasoning trace.

transparency of exposition
redundancy of presentation
ambiguity of argument articulation
veracity of surface logical structure

🤔 What for?

👉 Detect whether your LLM fails to render its reasoning in comprehensible ways—which may impair human-AI interaction, or prevent other AI agents from taking the reasoning fully into account.

For more technical info on our metrics, see our Critical Thinking Zoo notebook and the code's analyst registry.

Argument mapping artifacts

Reveal, represent or visualize the argumentation, based on a charitable and systematic reconstruction of the reasoning trace.

pros and cons list
✨fuzzy✨ argument map
argument map as svg
nested pros cons sunburst

🤔 What for?

👉 Check visualizations rather then read lengthy reasoning traces when debugging your LLM app.
👉 Build your own metrics and evaluations exploiting the deep structure revealed by our artifacts.

Argumentative text annotation artifacts 🚧

Annotate reasons, arguments and argumentative relations in LLM-generated argumentative texts.

argumentative entity annotation
argumentative relation annotation

🤔 What for?

👉 Check visualizations rather then read lengthy reasoning traces when debugging your LLM app.
👉 Build your own metrics and evaluations exploiting our annotations of LLM-generated texts.

For more technical info on our artifacts, see our Critical Thinking Zoo notebook and the code's analyst registry.

Examples

See examples folder for details and more.

Known limitations

Ability to correctly relate individual reasons to each other scales with model size and is severely limited for 7B expert models.
...

Stay tuned for

More examples #1
Integrations with MLOps tools #2
Model benchmarks and validation
More metrics and artifacts
Speedups and optimizations
Logikon /\/ Cloud

Name		Name	Last commit message	Last commit date
Latest commit History 491 Commits
docs		docs
examples		examples
src/logikon		src/logikon
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs

docs

examples

examples

src/logikon

src/logikon

tests

tests

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

pyproject.toml

pyproject.toml

Repository files navigation

Logikon

Installation

Highlights

Analyze and score completions with one extra line of code

Configure metrics, artifacts and evaluation methods

Analytics

Argumentation quantity metrics

Argumentative bias metrics

Argumentation clarity metrics 🚧

Argument mapping artifacts

Argumentative text annotation artifacts 🚧

Examples

Known limitations

Stay tuned for

About

Releases 3

Languages

License

logikon-ai/logikon

Folders and files

Latest commit

History

Repository files navigation

Logikon

Installation

Highlights

Analyze and score completions with one extra line of code

Configure metrics, artifacts and evaluation methods

Analytics

Argumentation quantity metrics

Argumentative bias metrics

Argumentation clarity metrics 🚧

Argument mapping artifacts

Argumentative text annotation artifacts 🚧

Examples

Known limitations

Stay tuned for

About

Topics

Resources

License

Stars

Watchers

Forks

Languages