Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create KGTK augment command #638

Open
wants to merge 34 commits into
base: dev
Choose a base branch
from
Open

Create KGTK augment command #638

wants to merge 34 commits into from

Conversation

GrantXie
Copy link
Contributor

No description provided.

@pep8speaks
Copy link

pep8speaks commented Feb 10, 2022

Hello @GrantXie! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2022-02-22 10:12:53 UTC

@GrantXie GrantXie closed this Feb 14, 2022
@GrantXie GrantXie reopened this Feb 15, 2022
Copy link
Member

@saggu saggu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Every function should have augmented parameters. Lot of changes required. Some repeated mistakes from previous PRs. Please update

@@ -0,0 +1,177 @@
## Summary

This command will augmented graph from a KGTK Edge file with numeric value in float (or date) on node2. This command will automatically detect date in wikidata format and transform it to float in year
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please fix the grammar. Also I can't understand what this command will do from this description. Please update


### The Output File

The output file is an edge file for each mode that contains the following columns:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what extra edges will be added? Example?

-o OUTPUT_FILE, --output-file OUTPUT_FILE
The KGTK output file. (May be omitted or '-' for
stdout.)
--dataset DATASET Specify the location of dataset.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what exactly is the location of dataset?

The KGTK output file. (May be omitted or '-' for
stdout.)
--dataset DATASET Specify the location of dataset.
--train-file-name TRAIN_FILE_NAME
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All these new parameters need to have longer descriptions

Specify name for training file
--numerical-literal-name NUM_LITERAL_NAME
Specify name for numerical literal file
--valid-file-name VALID_FILE_NAME
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

?? I will only add this comment here, add longer description help messages

collections_raw = defaultdict(list)

if train_edges_raw is not None:
for i, row in train_edges_raw.iterrows():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove df.iterrows(), it is the most inefficient function

import pandas as pd
from tqdm import tqdm
from bisect import bisect
from kgtk.augment.utils import *
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

import * again


def gen_plabel(pnode, unit=None):
if not unit:
return pnode + ' (Interval)'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use f strings everywhere, be consistent

parser.add_output_file()

parser.add_argument('--dataset', dest='dataset', type=str,
default=None,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add longer meaningful help messages

from kgtk.cli_entry import cli_entry
from kgtk.exceptions import KGTKArgumentParseException
import glob

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for all of unit tests, you have to test on content also, instead of only length

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants