microsoft
diff --git a/‎DiGA/README.md
Lines changed: 88 additions & 0 deletions b/‎DiGA/README.md
Lines changed: 88 additions & 0 deletions
diff --git a/‎DiGA/agent/interactive_replay_agent.py
Lines changed: 183 additions & 0 deletions b/‎DiGA/agent/interactive_replay_agent.py
Lines changed: 183 additions & 0 deletions
@@ -0,0 +1,88 @@
+## Codes for Diffusion Guided Meta Agent~(DiGA)
+
+### Introduction
+In this repository, we provide the core codes of [DiGA](https://arxiv.org/pdf/2408.12991), including training and generating scripts.
+`train.py` is the Python script used for training meta controller. `generate.py` is the Python script used for generating meta agent that is guided by the meta controller for obtaining synthetic market trading records. The code for training rl agent in the simulated environment is provided under `rltask/train_test_rl.py` script.
+
+### Prerequisites
+We recommend to use conda environment. The required packages can be install with:
+```python
+conda env create --file environment.yaml
+```
+After installation, use `conda activate diga` for activating the conda environment.
+
+After that, install required packages following the instructions from [MarS](https://github.com/microsoft/MarS).
+
+Data should be processed into the shape of $(N, C, T)$ where $N$ is the number of samples, $C$ is the number of market state parameters, $T$ is the effective number of minutes of each sample. In our case, $C=2$ along where the first dimension should be the mid-price log return rate and the second dimension should be the number of orders within each minute; $T=236$ which is the effective number of minutes in one trading day. 
+
+> Original data can be purchased from lisenced data vendors (e.g. Wind, Thomson Reuters).
+
+
+## Code Examples
+For training a meta controller:
+
+```python
+python train.py --data_name "SZAMain" --ctrl_type "continuous" --ctrl_target "return" --n_bins 5 --diffsteps 200 --epochs 10 --checkpoints 3 --data_path {your_data_path} --output_path {your_output_path} --seed 0
+```
+
+Code above will train meta controller with 'SZAMain' dataset, control on return, using continuous control encoder. By default the control target(return) is divided into 5 bins (indicating 5 control classes: lower, low, medium, high, higher). Moreover, the diffusion model in meta controller is configured to perform 200 diffusion steps, trained with a maximum number of epochs as 10. Make sure the dataset is stored in `{your_data_path}`, named after `{data_name}_train.npy` and `{data_name}_vali.npy`. The trained model will be saved inside `{your_output_path}/DiGA_{data_name}_{ctrl_type}_{ctrl_target}_{seed}/` by default. 
+
+For generating with meta agent guided by meta controller:
+
+```python
+python generate.py --data_name "SZAMain" --ctrl_type "continuous" --ctrl_target "return" --ctrl_class 0 --cond_scale 1 --samsteps 20 --data_path {your_data_path} --output_path {your_output_path} --exp_name {your_exp_name} --checkpoint_path "last.ckpt" --save_name "DiGA_generation" --seed 0
+```
+
+Code above will first sample from the trained meta controller, conditioned on `ctrl_class` of `ctrl_target`. If `ctrl_type` is "discrete", then the `ctrl_class` refers to the selected bin. If `ctrl_type` is "continuous", the `ctrl_class` refers to the relative strength of `ctrl_target`. `cond_scale` controls the strength of classifier-free guidance during diffusion model sampling. After sampling, the meta agent will generate the trade records within one trading day and the records are saved into `{output_path}/{exp_name}/{save_name}.pkl`, where the `exp_name` may be `"DiGA_{data_name}_{ctrl_type}_{ctrl_target}_{seed}"` in the above sample.
+
+For running RL training in the generated market:
+```python
+python rltask/train_test_rl.py --market "DiGA" --data_path {your_data_path} --test_replay_path {your_test_replay_path} --output_path {your_output_path} --save_name {your_save_name}
+```
+Code above will train RL agent in a market environment generated by DiGA. It takes pre-computed meta controller samples from `data_path` for better efficiency. The trained agent and testing results are stored in `{output_path}/{save_name}`. For training with DiGA environment, the file in `data_path` should contain a dict with each item storing one sample generated by meta controller. For training (or testing) with Replay environment, the data in `data_path` (or `test_replay_path`) should contain paths to orders and transactions records (both in csv, preprocessed using market_simulation libary).
+
+### Code argument details
+`train.py` accepts the following arguments:
+- `--data_name`: The name of the dataset to use for training.
+- `--ctrl_type`: The type of control to use (continuous or discrete).
+- `--ctrl_target`: The target of the control. (i.e. return, volatility)
+- `--n_bins`: The number of bins to use for the discretization of `ctrl_target`.
+- `--diffsteps`: The number of diffusion steps.
+- `--samsteps`: The number of sampling steps.
+- `--epochs`: The number of training epochs. Either `epochs` or `maxsteps` should be set and the training will stop when either one is reached.
+- `--maxsteps`: The maximum number of steps for the trainer. Either `epochs` or `maxsteps` should be set and the training will stop when either one is reached.
+- `--batch_size`: The batch size for training.
+- `--learning_rate`: The learning rate for the optimizer.
+- `--checkpoints`: The number of checkpoints to save.
+- `--data_path`: The path to your training data.
+- `--output_path`: The path where you want to save your trained model and other output files.
+- `--seed`: The seed for random number generation.
+- `--num_workers`: The number of workers to use for data loading.
+
+`generate.py` accepts the following arguments:
+
+- `--data_name`: The name of the dataset to use for training.
+- `--ctrl_type`: The type of control to use (continuous or discrete).
+- `--ctrl_target`: The target of the control. (i.e. return, volatility)
+- `--n_bins`: The number of bins to use for the discretization of `ctrl_target`.
+- `--diffsteps`: The number of diffusion steps.
+- `--samsteps`: The number of sampling steps.
+- `--seed`: The seed for random number generation.
+- `--data_path`: The path to your training data.
+- `--output_path`: The path where you want to save your trained model and other output files.
+- `--exp_name`: The name of the experiment.
+- `--checkpoint_path`: The path to the checkpoint file from your `output_path`.
+- `--save_name`: The name of the file to save the generated data.
+- `--random_price`: Whether to generate a random initial price.
+- `--pseudo_price`: The initial price to use if random_price is not set.
+
+`rltask/train_test_rl.py` accepts the following arguments:
+
+- `--market`: The type of market environment to use for training. Options are 'DiGA' and 'Replay'.
+- `--max_steps`: The maximum number of steps for the trainer.
+- `--save_name`: The folder name for saving rl run.
+- `--eval_eps`: The number of evaluation episodes.
+- `--data_path`: The path to data for generating training environment.
+- `--test_replay_path`: The path to data for generating testing environment.
+- `--output_path`: The path where you want to save your trained model and other output files.
+- `--seed`: The seed for random number generation.
@@ -0,0 +1,183 @@
+
+import logging
+from typing import Callable, Dict, List, Optional, cast
+
+from pandas import Timestamp
+
+from market_simulation.states.trans_state import TransState
+from market_simulation.wd.wd_order import WdOrder
+from mlib.core.action import Action
+from mlib.core.base_agent import BaseAgent
+from mlib.core.base_order import BaseOrder
+from mlib.core.observation import Observation
+from mlib.core.limit_order import LimitOrder
+from mlib.core.orderbook import Orderbook
+from mlib.core.state import State
+from mlib.core.transaction import Transaction
+
+
+class ReplayAgent(BaseAgent):
+    """A agent used to replay market with orders and verify with transactions."""
+
+    def __init__(
+        self,
+        symbol: str,
+        orders: List[BaseOrder],
+        transactions: List[Transaction],
+        on_order_submit: Optional[Callable[["ReplayAgent", BaseOrder], None]] = None,
+    ) -> None:
+        super().__init__(init_cash=0, communication_delay=0, computation_delay=0)
+        self.symbol: str = symbol
+        self.orders: List[BaseOrder] = orders
+        self.transactions = transactions
+        self._next_wakeup_order_index = 0
+        self._num_check_transactions = 0
+        self.on_order_submit = on_order_submit
+        assert self.orders
+
+    def get_next_wakeup_time(self, time: Timestamp) -> Optional[Timestamp]:
+        if self._next_wakeup_order_index >= len(self.orders):
+            return None
+        next_time = self.orders[self._next_wakeup_order_index].time
+        self._next_wakeup_order_index += 1
+        assert next_time >= time
+        return next_time
+
+    def get_action(self, observation: Observation, orderbook: Orderbook) -> Action:
+        """Get action given observation.
+
+        It delegates its main functions to:
+        - `get_next_wakeup_time` to get the next wakeup time, and
+        - `get_orders` to get orders based on observation. `get_orders` will not be called for the first-time wakeup,
+            when it's the market open wakeup.
+
+        """
+        assert self.agent_id == observation.agent.agent_id
+        time = observation.time
+        # return empty order for the market open wakeup
+        orders: List[BaseOrder] = [] if observation.is_market_open_wakup else self.get_orders(time, orderbook)
+        action = Action(
+            agent_id=self.agent_id,
+            time=time,
+            orders=orders,
+            next_wakeup_time=self.get_next_wakeup_time(time),
+        )
+        return action
+
+    def get_orders(self, time: Timestamp, orderbook: Orderbook):
+        cur_order_index = self._next_wakeup_order_index - 1
+        assert cur_order_index >= 0
+        order = self.orders[cur_order_index]
+        assert time == order.time
+        if self.on_order_submit is not None:
+            self.on_order_submit(self, order)
+        validated = [self.validate_order(order, orderbook)]
+        return [order for order in validated if order is not None]
+
+    def on_states_update(self, time: Timestamp, symbol_states: Dict[str, Dict[str, State]]):
+        super().on_states_update(time, symbol_states)
+
+    def check_new_transactions_match(self):
+        state_name = TransState.__name__
+        assert state_name in self.symbol_states[self.symbol]
+        state = cast(TransState, self.symbol_states[self.symbol][state_name])
+        new_trans = state.transactons[self._num_check_transactions :]
+        _check_transactions_match(self.transactions, new_trans, False, self._num_check_transactions)
+        self._num_check_transactions = len(state.transactons)
+
+    def on_market_close(self, time: Timestamp):
+        super().on_market_close(time)
+        _check_same_symbol_orders(self.agent_id, self.lob_orders, self.lob_price_orders, self.symbol_states)
+
+    def validate_order(self, order: WdOrder, orderbook: Orderbook):
+        if order.type != 'C':
+            order = order.get_limit_orders(orderbook)[0]
+        else:
+            valid_cancel_vol = 0
+            if order.cancel_id in self.lob_orders[self.symbol].keys():
+                to_cancel = self.lob_orders[self.symbol][order.cancel_id]
+                valid_cancel_vol = to_cancel.volume
+
+            if valid_cancel_vol != 0:
+                order.volume = valid_cancel_vol
+            else:
+                logging.warning(f"Invalid order {order}.")
+                order = None
+        if order is not None and order.price <= 0 :
+            logging.warning(f"Invalid order {order}.")
+            order = None
+
+        return order
+
+
+
+
+def _check_same_symbol_orders(
+    agent_id: int,
+    lob_orders: Dict[str, Dict[int, LimitOrder]],
+    lob_price_orders: Dict[str, Dict[int, Dict[int, LimitOrder]]],
+    symbol_states: Dict[str, Dict[str, State]],
+):
+    symbols = lob_orders.keys()
+    state_name: str = State.__name__
+    for symbol in symbols:
+        close_orderbook = symbol_states[symbol][state_name].close_orderbook
+        if close_orderbook is None:
+            # skip checking as close orderbook is empty, this happens when no close-auction.
+            continue
+
+        _check_same_orders_on_symbol(
+            agent_id=agent_id,
+            lob_orders=lob_orders[symbol],
+            lob_price_orders=lob_price_orders[symbol],
+            orderbook=close_orderbook,
+        )
+
+
+def _check_same_orders_on_symbol(
+    agent_id: int,
+    lob_orders: Dict[int, LimitOrder],
+    lob_price_orders: Dict[int, Dict[int, LimitOrder]],
+    orderbook: Orderbook,
+):
+    remaining_orders: List[LimitOrder] = []
+    for level in orderbook.asks + orderbook.bids:
+        remaining_orders.extend([x for x in level.orders if x.agent_id == agent_id])
+    _check_same_orders(lob_orders, remaining_orders)
+
+    price_orders: List[LimitOrder] = []
+    for value in lob_price_orders.values():
+        price_orders.extend(value.values())
+    _check_same_orders(lob_orders, price_orders)
+
+
+def _check_same_orders(lob_orders: Dict[int, LimitOrder], orders: List[LimitOrder]):
+    assert len(orders) == len(lob_orders)
+    for order in orders:
+        assert order.order_id in lob_orders
+        my_order = lob_orders[order.order_id]
+        assert str(order) == str(my_order)
+
+
+def _check_transactions_match(trans_label: List[Transaction], trans_replay: List[Transaction], output_details: bool = True, label_start: int = 0):
+    end = label_start + len(trans_replay)
+    assert label_start >= 0
+    if len(trans_label) < end:
+        logging.error(f"not enough transactoin [{label_start}, {end}), only {len(trans_label)}.")
+        return False
+
+    if len(trans_replay) == 0:
+        return True
+
+    for index in range(len(trans_replay)):
+        str_label = str(trans_label[label_start + index])
+        str_replay = str(trans_replay[index])
+        if str_label == str_replay:
+            if output_details:
+                logging.info(f"same for {label_start + index}|{index}th trans: {str_label}")
+            continue
+        logging.error(f"diff for {label_start + index}|{index}th trans")
+        logging.info(f"  label: {str_label}")
+        logging.info(f"  reply: {str_replay}")
+        return False
+    return True