Skip to content

logikon-ai/logikon

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Logikon Logo

Logikon

Analytics for LLM Reasoning Traces.

HighlightsAnalyticsExamplesStay tuned • Docs 🚧

Logikon /\/ is a library for analyzing and scoring the quality of plain-text reasoning traces produced by LLMs (or humans). It reveals the argumentative structure of LLM outputs, visualizes reasoning complexity, and evaluates its quality.

Logikon /\/ allows you to automatically supervise the AI agents in your advanced LLM apps. This can be used for debugging and monitoring your AI assistants, or for evaluating the quality of human–AI interaction.

Logikon /\/ is highly customizable and extensible. You can choose from a variety of metrics, artifacts, and evaluation methods, pick an expert LLM for logical analysis, and even build your own metrics on top of Logikon's artifacts.

Warning

Logikon /\/ is currently in early beta. The API is subject to change. Please be patient, and report any issues you encounter.

Installation

pip install git+https://github.com/logikon-ai/logikon@v0.1.0

See examples folder for more details.

Highlights

Analyze and score completions with one extra line of code

# LLM generation
prompt = "Vim or Emacs? Reason carefully before submitting your choice."
completion = llm.predict(prompt)

# Analyze and score reasoning 🚀
import logikon

score = logikon.score(prompt=prompt, completion=completion)

#  >>> print(score.info())
#  argmap_size: 13
#  n_root_nodes: 3
#  global_balance: -.23

Configure metrics, artifacts and evaluation methods

import logikon

# Configure scoring methods
config = logikon.ScoreConfig(
    expert_model = "code-davinci-002",  # expert LLM for logical analysis
    metrics = ["argmap_attack_ratio"],  # ratio of objections
    artifacts = ["svg_argmap"],         # argument map as svg
)

# LLM generation
...

# Debug and score reasoning
score = logikon.score(config=config, prompt=prompt, completion=completion)

Analytics

Argumentation quantity metrics

Score the quantity of arguments in the reasoning trace.

  • number of arguments
  • number of central claims
  • density of argumentation network
  • mean strength of arguments

🤔 What for?

👉 Detect where your LLM fails to generate (sufficiently many) reasons when deliberating a decision or justifying an answer it has given—which may lead to poor AI decision-making and undermine AI explainability.

Argumentative bias metrics

Score the balance of arguments in the reasoning trace.

  • mean support/attack bias averaged over all central claims
  • global support/attack balance
  • naive pros/cons ratio

🤔 What for?

👉 Detect whether your (recently updated) advanced LLM app suddenly produces biased reasoning—which may indicate flawed reasoning that reduces your app's performance.

Argumentation clarity metrics 🚧

Score the presentation of arguments in the reasoning trace.

  • transparency of exposition
  • redundancy of presentation
  • ambiguity of argument articulation
  • veracity of surface logical structure

🤔 What for?

👉 Detect whether your LLM fails to render its reasoning in comprehensible ways—which may impair human-AI interaction, or prevent other AI agents from taking the reasoning fully into account.

For more technical info on our metrics, see our Critical Thinking Zoo notebook and the code's analyst registry.

Argument mapping artifacts

Reveal, represent or visualize the argumentation, based on a charitable and systematic reconstruction of the reasoning trace.

  • pros and cons list
  • ✨fuzzy✨ argument map
  • argument map as svg
  • nested pros cons sunburst

🤔 What for?

👉 Check visualizations rather then read lengthy reasoning traces when debugging your LLM app.
👉 Build your own metrics and evaluations exploiting the deep structure revealed by our artifacts.

Argumentative text annotation artifacts 🚧

Annotate reasons, arguments and argumentative relations in LLM-generated argumentative texts.

  • argumentative entity annotation
  • argumentative relation annotation

🤔 What for?

👉 Check visualizations rather then read lengthy reasoning traces when debugging your LLM app.
👉 Build your own metrics and evaluations exploiting our annotations of LLM-generated texts.

For more technical info on our artifacts, see our Critical Thinking Zoo notebook and the code's analyst registry.

Examples

See examples folder for details and more.

Known limitations

  • Ability to correctly relate individual reasons to each other scales with model size and is severely limited for 7B expert models.
  • ...

Stay tuned for

  • More examples #1
  • Integrations with MLOps tools #2
  • Model benchmarks and validation
  • More metrics and artifacts
  • Speedups and optimizations
  • Logikon /\/ Cloud