added skeleton of package for python support #81

skewballfox · 2020-08-31T22:37:01Z

I added a skeleton of a python library that can be fleshed out to handle the currently supported dataTypes. If I need to make any changes I can, this isn't meant to be working at the current stage so much as provide a bit of structure that makes contributions covering one aspect of functionality easier.

python support

insolor · 2020-09-02T07:13:16Z

data-extraction/pyDataExtraction/__main__.py

+from pyDataExtraction.commonTypes.Graph import Graph
+
+if __name__ == "__main__":
+    graph_data1 = {"A": ["B", "C"], "B": ["A,C"], "C": ["A,D"], "D": ["A"]}


I think "A,C" and "A,D" must be separated?

graph_data1 = {"A": ["B", "C"], "B": ["A", "C"], "C": ["A", "D"], "D": ["A"]}

Multiline formatting of dictionary seems to be more readable

I've mainly be using black for formatting, and it seems to condense it down if a line is less than 88 characters.

also, the main is current just representing how this would be used python side, but this will be where typescript interfaces with the code, passing information and returning information to the typescript for visualization, so this is going to be reworked extensively once I can get a look at the data format passed from the PyEvaluationEngine.ts file.

Rdroshan · 2020-09-02T07:59:08Z

data-extraction/pyDataExtraction/commonTypes/Graph.py

+    def __repr__(self):
+        pass
+"""
+# NOTE: ran into issue Node object is not json serializable when ecapsulating in own class


We can add a method in Node class to serialize the data into json.

thanks, I'm just now starting to actually learn about serialization due to this project and a personal rust project.

could you point me in the direction of some documentation on how to do so?

@skewballfox you can simply create a dictionary of the desired form and then serialize it into JSON:

def to_json(self): return json.dumps({'from': self.from_node, 'to': self.to})

(this is an example for the Edge class, you can do this for Node class too)

This is a crude approach, which is inconvenient for classes with many fields, but in this case, I think it is quite applicable. Also such a way you can use from_node field name for the 'from' JSON key.

to_node would also be more readable.

@insolor this is similar to what I was doing in __repr__ but that definitely helps fix that issue in a way that doesn't involve mangling the language. I don't know why overloading from was my first guess.

also I think I figured out why Node wasn't serializable. It was because the __repr__ function doesn't seem to be recursive. as an example When calling print(graph) that doesn't automatically behave as if print(node) were called.

either I could come up with a generic method that does this in DataType to avoid having to define it in every current and future dataType that uses other objects as components(if present), or we have to overload the function in those cases.

I think I might have an idea using __slots__, I'll have to see though

mvoitko · 2020-09-02T08:42:44Z

data-extraction/pyDataExtraction/__main__.py

@@ -0,0 +1,10 @@
+from json import dumps


Suggest using import json to follow "Explicit is better than implicit" principle.

mvoitko · 2020-09-02T08:44:09Z

data-extraction/pyDataExtraction/commonTypes/Graph.py

+
+class Graph(DataType):
+    """
+


Why have empty lines here?

mvoitko · 2020-09-02T08:44:55Z

data-extraction/pyDataExtraction/commonTypes/Graph.py

+        super().__init__()
+        self.kind["graph"] = True
+        # TODO get working for both a dictionary and an nxn array
+        # self.nodes = [Node(node) for node in graph data]


why have commented out code?

I commented it out mainly to have a reference to the other possible implementation, and because I kept going back and forth between different implementations.

I would rather have Nodes and Edges encapsulated in their own class, in case we want to implement different types of graphs.

the commented out code was meant to be deleted after the class was complete. I can go ahead and delete it now though

mvoitko · 2020-09-02T08:45:38Z

data-extraction/pyDataExtraction/commonTypes/Graph.py

+            # self.nodes = [Node(node) for node in graph data]
+            for node in graph_data:
+                self.nodes.append({"id": str(node)})
+                # TODO change prints to log statements


It seems like a too minor change to left a TODO for it and not implement it right away

that's partially because my experience implementing logs across a library is none, I've only implemented them in a single file before, and didn't try to make it where the log statement was different across classes.

mvoitko · 2020-09-02T08:46:41Z

data-extraction/pyDataExtraction/commonTypes/Graph.py

+                    # print("edge: ", graph_data[node][edge_i])
+                    self.edges.append({"from": node, "to": edge})
+                    # edge_i += 1
+                # edge_i = 0


why keep commented out code?

mvoitko · 2020-09-02T08:47:46Z

data-extraction/pyDataExtraction/commonTypes/base.py

@@ -0,0 +1,24 @@
+from json import dumps


Suggest just import json for the sake of readability

mvoitko · 2020-09-02T08:48:54Z

data-extraction/pyDataExtraction/commonTypes/base.py

@@ -0,0 +1,24 @@
+from json import dumps
+from abc import ABC, abstractmethod


Is abstractmethod used somewhere?

no, I didn't realize I had left that in there.

mvoitko

Comments mainly about code style

mvoitko · 2020-09-02T09:03:06Z

data-extraction/pyDataExtraction/__main__.py

+    graph_data1 = {"A": ["B", "C"], "B": ["A,C"], "C": ["A,D"], "D": ["A"]}
+    graph_data2 = {1: [2, 3], 2: [1, 3], 3: [1, 4], 4: [1]}
+    graph = Graph(graph_data1)
+    print(graph)


Please use logging with respective log level

I'm not sure what you mean. I was mainly using __main__ to verify that the graph implementation works as I was writing it, though that should be move to a separate directory for testing, as seems to be the standard practice for python libraries.

I'm not sure if we could use the same test directory as node without pytest throwing errors due to the presence of js files. my experience with pytest and how it works is limited.

mvoitko · 2020-09-02T09:05:50Z

data-extraction/pyDataExtraction/commonTypes/Graph.py

+class Edge:
+
+    def __init__(self,from: str, to: str,):
+        self.fromnode


please use self.from_node

mvoitko · 2020-09-02T09:06:05Z

data-extraction/pyDataExtraction/commonTypes/Graph.py

+"""
+class Edge:
+
+    def __init__(self,from: str, to: str,):


add space after comma

mvoitko · 2020-09-02T09:06:22Z

data-extraction/pyDataExtraction/commonTypes/Graph.py

+        self.fromnode
+
+    def __repr__(self):
+        pass


Why have an empty method?

because it needs to be implemented. This was mainly meant to be a skeleton of a library, make it easier for people to contribute to the development of one set of dataTypes. Also we can't use the dataType's __repr__ method for Edge or Node.

for one, json complains Node objects aren't serializable when calling __repr__ on graph.

two, from is a syntax token, and also what is expected to be inside the json representation of a graph. so in Edge's case in particular, it's going to require us to either:

do something hacky to use from as a keyword (not my preferred option),

parse the json representation of it's dict and change every instance of whatever variable we used to represent from

handle all the edge related stuff inside graph(would rather avoid that)

mvoitko · 2020-09-02T09:08:20Z

data-extraction/pyDataExtraction/commonTypes/Graph.py

+    def __init__(self, id: Union[int, str], label: Optional[str] = None):
+        super().__init__()
+        self.id = id
+        if label is None:


Suggest simplifying to:
self.lable = lable or id

that is actually a hella useful feature I didn't know about. Thanks for that.

also, given the implementation, and what we are passing the information to, this likely wouldn't work, as the attribute would still be listed in the json output(at least with the method currently used to produce json.

I'm not sure if this would be the case, but I'm trying to avoid all of the nodes being labeled None in the visualizer. So, I set it up to where attributes were only present in the case where a value was assigned.

Can you give an example? I'm not able to understand.
As from what I think if we go with
self.label = label or id
If anyone doesn't pass label the default is None and the self.label will be populated by id.
Correct?

mvoitko · 2020-09-02T09:11:50Z

data-extraction/pyDataExtraction/commonTypes/Graph.py

+        # self.nodes = [Node(node) for node in graph data]
+        self.nodes = []
+        self.edges = []
+        if isinstance(graph_data, dict):


We should not check the type here. We expect graph data to be a dictionary otherwise we should have exception not silence it.

well, no, the reason being is not counting classes for graphs(just too much trouble to worry about at this moment), you can have a graph represented as a dict or 2d array.

I intended after getting the dictionary implementation working to check else isinstance(graph_data, list) or something similar for explicitly a list of list.

mvoitko · 2020-09-02T09:15:01Z

data-extraction/pyDataExtraction/commonTypes/Graph.py

+                    # print(edge_i)
+                    # print("edge: ", graph_data[node][edge_i])
+                    self.edges.append({"from": node, "to": edge})
+                    # edge_i += 1


for node, edge in graph_data.items(): self.node.append({"from": node, "to": edge}) self.edges.append({"from": node, "to": edge})

tested in ipython

for node, edge in graph_data1.items(): print(node," : ", edge)

output:

A : ['B', 'C'] B : ['A', 'C'] C : ['A', 'D'] D : ['A']

you could do so with a second loop for edge in edges

mvoitko · 2020-09-02T09:15:39Z

data-extraction/pyDataExtraction/commonTypes/Graph.py

+
+
+    Args:
+        DataType (Union[Dict[str,list],Dict[int,list]]): 


add speces after commas

mvoitko · 2020-09-02T09:16:07Z

data-extraction/pyDataExtraction/__main__.py

+from typing import Union, Dict, Optional
+from abc import ABC, abstractmethod
+from pyDataExtraction.commonTypes.Graph import Graph
+


add newline

We can run flake8(for linting) and black(code formatting) in local(console) to do all such changes automatically and consistency will be maintained for all the contributors.

I currently use black for formatting, though I've never used flake8, nor have I used black from the commandline. I know there's a precommit plugin for git that would call both to on attempted commits to make sure that they are used.

You just have to install black in local(or in the virtual environment of python) and run black <file_name> This will do the formatting for the file.
And about flake8, It's just that we can have a consistent style guide for python over this project which follows pep-8 guidelines.
As some of the comments by @mvoitko is regarding this only.
Just a suggestion 😄

mvoitko

More comments

mvoitko · 2020-09-02T11:59:04Z

data-extraction/pyDataExtraction/commonTypes/Text.py

+        self.kind["text"] = True
+        self.text = text_data
+        if mimeType is None:
+            self.mimeType = mimeType


what are you trying to achieve here?

ah, that was probably a missing not

mvoitko · 2020-09-02T12:00:13Z

data-extraction/pyDataExtraction/commonTypes/Text.py

+        self.text = text_data
+        if mimeType is None:
+            self.mimeType = mimeType
+        if fileName is None:


I suggest you wanted the opposite:

if file_name: self.file_name = file_name

it may be when dealing with edge cases far removed from this or a leftover from python 2, but from what I keep reading, using is None seems to be preferred when dealing with values that may be None

also, while editing this, I just remembered why I checked. because this library currently relies on json.dumps(self.__dict__) to print the json representation of an object in a format the plugin expects, I was only conditionally instantiating these variables.

they weren't part of the object unless they were explicitly added to the object.

this may be another reason to either find a better method of creating json objects, or use super to get the inherited json representation and return a version altered in some way.

mvoitko · 2020-09-02T12:01:51Z

data-extraction/pyDataExtraction/commonTypes/Text.py

+    def __init__(
+        self,
+        text_data: str,
+        mimeType: Optional[str] = None,


let's keep it to the pep-8 python style guide:

mime_type, file_name

the variable names are mainly because of the use case and the method currently for doing so.

right now I'm directly converting the respective dataTypes to json objects when they are printed or called as a string, and variables are the displayed variables inside the json string, so I've been matching the case of the specification listed on the Readme.

Though since we already need to do some manipulations on the json representation before returning in the case of the Edge class, doing so here shouldn't be a major issue.

btw if we got rid of the reliance on __dict__, we could use __slots__ to slightly improve performance and reduce memory usage.

mvoitko · 2020-09-02T12:02:19Z

data-extraction/pyDataExtraction/commonTypes/Text.py

+    def __init__(
+        self,
+        text_data: str,
+        mimeType: Optional[str] = None,


please name params in snake case:
mime_type, file_name

mvoitko · 2020-09-02T12:02:41Z

data-extraction/pyDataExtraction/commonTypes/Text.py

+    def __init__(
+        self,
+        text_data: str,
+        mimeType: Optional[str] = None,


please use snake case:
mime_type, file_name

added skeleton of package for full

91f43d6

python support

hediet mentioned this pull request Sep 2, 2020

what do I need to do to add full python/Language X support? #63

Open

insolor reviewed Sep 2, 2020

View reviewed changes

Rdroshan reviewed Sep 2, 2020

View reviewed changes

mvoitko reviewed Sep 2, 2020

View reviewed changes

data-extraction/pyDataExtraction/commonTypes/Graph.py Outdated

class Graph(DataType):

"""

Copy link

mvoitko Sep 2, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why have empty lines here?

mvoitko reviewed Sep 2, 2020

View reviewed changes

data-extraction/pyDataExtraction/commonTypes/Graph.py

"""

class Edge:

def __init__(self,from: str, to: str,):

Copy link

mvoitko Sep 2, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add space after comma

mvoitko reviewed Sep 2, 2020

View reviewed changes

data-extraction/pyDataExtraction/commonTypes/Graph.py Outdated

Args:

DataType (Union[Dict[str,list],Dict[int,list]]):

Copy link

mvoitko Sep 2, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add speces after commas

mvoitko reviewed Sep 2, 2020

View reviewed changes

skewballfox added 2 commits September 2, 2020 10:04

changes related to comments on pull request

691525c

added comment to __main__ for clarification

607d681

		@@ -0,0 +1,24 @@
		from json import dumps
		from abc import ABC, abstractmethod

added skeleton of package for python support #81

Are you sure you want to change the base?

added skeleton of package for python support #81

Conversation

skewballfox commented Aug 31, 2020

insolor Sep 2, 2020 • edited

Choose a reason for hiding this comment

mvoitko Sep 2, 2020 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

insolor Sep 2, 2020 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

skewballfox Sep 3, 2020 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mvoitko left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

skewballfox Sep 2, 2020 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mvoitko left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

insolor Sep 2, 2020 •

edited

mvoitko Sep 2, 2020 •

edited

insolor Sep 2, 2020 •

edited

skewballfox Sep 3, 2020 •

edited

skewballfox Sep 2, 2020 •

edited