Release v2.3.0

Ayan-Kumar-Saha · Ayan-Kumar-Saha · commit 9fe0757bc4c6 · 2020-08-28T22:13:27.000+05:30
- add support for MCT algorithm
- update documentation
- fix minor bugs
diff --git a/README.md b/README.md
@@ -1,11 +1,11 @@
 # Markov Chain Type 4 Rank Aggregation
-**implementation of MC4 Rank Aggregation algorithm using Python**
+**implementation of MC4 and MCT Rank Aggregation algorithm using Python**
 
 ## Description
 
-This project is all about implementing one of the most popular rank aggregation algorithms **Markov Chain Type 4** or **MC4**. In the field of Machine Learning and many other scientific problems, several items are often needed to be ranked based on some criterion. However, different ranking schemes order the items based on different preference criteria. Hence the rankings produced by them may differ greatly.
+This project is all about implementing two of the most popular rank aggregation algorithms, **Markov Chain Type 4** or **MC4** and **MCT**. In the field of Machine Learning and many other scientific problems, several items are often needed to be ranked based on some criterion. However, different ranking schemes order the items based on different preference criteria. Hence the rankings produced by them may differ greatly.
 
-Therefore a rank aggregation technique is often used for combining the individual rank lists into a single aggregated ranking. Though there are many rank aggregation algorithms, MC4 is one of the most renowned ones.
+Therefore a rank aggregation technique is often used for combining the individual rank lists into a single aggregated ranking. Though there are many rank aggregation algorithms, MC4 and MCT are two of the most renowned ones.
 
 ## Resource
 
@@ -23,24 +23,31 @@ For a specific release, `pip install mc4=={version}` such as `pip install mc4==1
 
 ## General Usage
 
-Using this package is very easy. You just need the following three lines of code to use the package.
+Using this package is very easy.
+
+1. Prepare a dataset containing ranks of all the items provided by different algorithms. See [here](https://github.com/kalyaniuniversity/MC4/blob/master/test_datasets/README.md) for sample datasets and more info.
+
+2. Use following lines of code to use the package. Make sure to pass arguments according to your dataset otherwise answers will be incorrect.
 
 ```python
 from mc4.algorithm import mc4_aggregator
+import pandas as pd
 
-aggregated_ranks = mc4_aggregator('dataset.csv') 
+# Method 1
+aggregated_ranks = mc4_aggregator('test_dataset_1.csv', header_row = 0, index_col = 0) 
 
-# or 
-
-aggregated_ranks = mc4_aggregator(df) 
+# or Method 2
+df = pd.read_csv('test_dataset_1.csv', header = 0, index_col = 0)
+aggregated_ranks = mc4_aggregator(df, header_row = 0, index_col = 0) 
 
 print(aggregated_ranks)
 ```
-here `dataset.csv` or `df` are lists of ranks provided by different ranking algorithms or rank lists. *You can refer [here](https://github.com/kalyaniuniversity/MC4/blob/master/test_datasets/datasets.md) for more info and some test datasets.*
+here `test_dataset_1.csv` is a sample dataset containing ranks of different items provided by different algorithms.
 
-`mc4_aggregator` takes some additional arguments as well.
+`mc4_aggregator` takes some mandatory and optional arguments -
 
-* `order (string)`: order of the dataset, default is `'row'`. More on this, [here](https://github.com/kalyaniuniversity/MC4/blob/master/test_datasets/datasets.md).
+* `algo (string)`: algorithm for rank aggregation, `mc4` or `mct`, default is `mc4`
+* `order (string)`: order of the dataset, `row` or `column`, default is `row`. More on this, [here](https://github.com/kalyaniuniversity/MC4/blob/master/test_datasets/README.md).
 * `header_row (int or None)`: row number of the dataset containing the header, default is `None`
 * `index_col (int or None)`: column number of the dataset containing the index, default is `None`
 * `precision (float)`: acceptable error margin for convergence, default is `1e-07`
@@ -49,49 +56,56 @@ here `dataset.csv` or `df` are lists of ranks provided by different ranking algo
 
 ## Command Line Usage
 
+You can directly use this package from command line if you have the dataset prepared already.
+
 * To get help and usage details,
     ```shell
     ~$ mc4_aggregator -h or --help
     ```
 
 * Use with default settings,
     ```shell
-    ~$ mc4_aggregator <data source> e.g. mc4_aggregator dataset.csv
+    ~$ mc4_aggregator dataset.csv
+    ```
+
+* Specify the algorithm for rank aggregation using `-a` or `--algo`, options: `mc4` or `mct`, default is `mc4`
+    ```shell
+    ~$ mc4_aggregator dataset.csv -a mct
     ```
 
-* Specify order using `-o`or `--order`, default is `row`
+* Specify order using `-o`or `--order`, options: `row` or `column`, default is `row`
     ```shell
-    ~$ mc4_aggregator <data source> -o <order> e.g. mc4_aggregator dataset.csv -o column
+    ~$ mc4_aggregator dataset.csv -o column
     ```
 
 * Specify header row using `-hr` or `--header_row`, default is `None`
     ```shell
-    ~$ mc4_aggregator <data source> -hr <header row> e.g. mc4_aggregator dataset.csv -hr 1
+    ~$ mc4_aggregator dataset.csv -hr 0
     ```
 
 * Specify index column using `-ic` or `--index_col`, default is `None`
     ```shell
-    ~$ mc4_aggregator <data source> -ic <index column> e.g. mc4_aggregator dataset.csv -ic 1
+    ~$ mc4_aggregator dataset.csv -ic 0
     ```
 
 * Specify precision using `-p` or `--precision`, default is `1e-07`
     ```shell
-    ~$ mc4_aggregator <data source> -p <precision> e.g. mc4_aggregator dataset.csv -p 0.000001
+    ~$ mc4_aggregator dataset.csv -p 0.000001
     ```
 
 * Specify iterations using `-i` or `--iterations`, default is `200`
     ```shell
-    ~$ mc4_aggregator <data source> -i <iterations> e.g. mc4_aggregator dataset.csv -i 300
+    ~$ mc4_aggregator dataset.csv -i 300
     ```
 
 * Specify ergodic number using `-e` or `--erg_number`, default is `0.15`
     ```shell
-    ~$ mc4_aggregator <data source> -p <precision> e.g. mc4_aggregator dataset.csv -e 0.20
+    ~$ mc4_aggregator dataset.csv -e 0.20
     ```
 
 * All together,
     ```shell
-    ~$ mc4_aggregator dataset.csv -o column -hr 1 -ic 1 -p 0.000001 -i 300 -e 0.20
+    ~$ mc4_aggregator dataset.csv -a mct -o column -hr 0 -ic 0 -p 0.000001 -i 300 -e 0.20
     ```
 
 ## Output
diff --git a/mc4/algorithm.py b/mc4/algorithm.py
@@ -48,12 +48,13 @@ def get_matrix_shape(df):
     return rows, cols
 
 
-def get_partial_transition_matrix(df, items, lists):
+def get_partial_transition_matrix(df, algo, items, lists):
 
     """Returns the partial transition matrix from the dataframe containing different ranks
 
     Args:
         df (pandas.core.DataFrame): dataframe object containing different ranks
+        algo (string): mc4 or mct
         items (int): number of items
         lists (int): number of lists
 
@@ -70,10 +71,13 @@ def get_partial_transition_matrix(df, items, lists):
 
             if result == 0 and i==j:
                 val = -1
-            elif result >= (lists/2):
+            elif result > (lists/2):
                 val = 0
             else:
-                val = 1
+                if algo == 'mc4':
+                    val = 1
+                else: 
+                    val = (lists-result) / lists
 
             matrix_input = val
 
@@ -216,12 +220,13 @@ def get_mapped_final_ranks(df, final_ranks, index_col):
     return ranks
 
 
-def mc4_aggregator(source, order = 'row', header_row=None, index_col=None, precision=0.0000001, iterations=200, erg_number=0.15):
+def mc4_aggregator(source, algo='mc4', order = 'row', header_row=None, index_col=None, precision=0.0000001, iterations=200, erg_number=0.15):
 
     """Performs aggregation on different ranks using Markov Chain Type 4 Rank Aggeregation algorithm and returns the aggregated ranks 
 
     Args:
         file_path (string): path of the dataset file containing all different ranks
+        algo (string): mc4 or mct, default is mc4
         order (string): order of the dataset, default is row i.e. row-major
         header_row (int or None): row number of the dataset containing the header, default is None
         index_col (int or None): column number of the dataset containing the index, default is None
@@ -233,6 +238,9 @@ def mc4_aggregator(source, order = 'row', header_row=None, index_col=None, preci
         list: contestantwise aggregated ranks
     """
 
+    if algo not in ['mc4', 'mct']:
+        raise Exception(f"Invalid ranking algorithm '{algo}'")
+
     if isinstance(source, str) and is_csv(source):
 
         if is_valid_path(source):
@@ -251,9 +259,10 @@ def mc4_aggregator(source, order = 'row', header_row=None, index_col=None, preci
     else:
         raise Exception(f"Unsupported data source '{get_filename(source)}'")
 
+
     rows, cols = get_matrix_shape(df)
 
-    partial_transition_matrix = get_partial_transition_matrix(df, rows, cols)
+    partial_transition_matrix = get_partial_transition_matrix(df, algo, rows, cols)
 
     normalized_transition_matrix = get_normalized_transition_matrix(partial_transition_matrix, rows)
 
diff --git a/mc4/command_line.py b/mc4/command_line.py
@@ -4,6 +4,7 @@
 parser = argparse.ArgumentParser(description='Takes necessary inputs for mc4_aggegator')
 
 parser.add_argument('source', type=str, help='source of the lists of ranks')
+parser.add_argument('-a', '--algo', type=str, default='mc4', help='rank aggregation algorithm, mc4 or mct, default is mc4', choices=['mc4', 'mct'])
 parser.add_argument('-o', '--order', type=str, default='row', help='order of the dataset, default is row', choices=['row', 'column'])
 parser.add_argument('-hr', '--header_row', type=int, help='row number of the header, default is None')
 parser.add_argument('-ic', '--index_col', type=int, help='column number of the index, default is None')
@@ -14,5 +15,4 @@
 args = parser.parse_args()
 
 def main():
-    print(mc4_aggregator(args.source, args.order, args.header_row, args.index_col, args.precision, args.iterations, args.erg_number))
-
+    print(mc4_aggregator(args.source, args.algo ,args.order, args.header_row, args.index_col, args.precision, args.iterations, args.erg_number))
diff --git a/setup.py b/setup.py
@@ -5,7 +5,7 @@
 
 setup(
     name="mc4",
-    version="2.2.1",
+    version="2.3.0",
     author="Ayan Kumar Saha",
     author_email="ayankumarsaha96@gmail.com",
     description="A python package for implementing Markov Chain Type 4 rank aggregation",
diff --git a/test_datasets/README.md b/test_datasets/README.md