All notable changes to the PyGraphistry are documented in this file. The PyGraphistry client and other Graphistry components are tracked in the main Graphistry major release history documentation.
The changelog format is based on Keep a Changelog. This project adheres to Semantic Versioning and all PyGraphistry-specific breaking changes are explictly noted here.
- Fix from_json when json object contains predicates.
- Fix refresh() for SSO
featurize()
, on error, coercesobject
dtype cols to.astype(str)
and retries
- Fix upload-time validation rejecting graphs without a nodes table
- Fix validations import.
- Validations for dataset encodings.
- GFQL: Export shorter alias
e
fore_undirected
- Featurize: More auto-dropping of non-numerics when no
dirty_cat
- GFQL:
hop()
defaults todebugging_hop=False
- GFQL: Edge cases around shortest-path multi-hop queries failing to enrich against target nodes during backwards pass
- Pin test env to work around test fails:
'test': ['flake8>=5.0', 'mock', 'mypy', 'pytest'] + stubs + test_workarounds,
+test_workarounds = ['scikit-learn<=1.3.2']
- Skip dbscan tests that require umap when it is not available
- GFQL: GPU acceleration of
chain
,hop
,filter_by_dict
AbstractEngine
toengine.py::Engine
enumcompute.typing.DataFrameT
to centralize df-lib-agnostic type checking
- GFQL and more of compute uses generic dataframe methods and threads through engine
- GPU tester threads through LOG_LEVEL
- GFQL
Chain
AST object - GFQL query serialization -
Chain
,ASTObject
, andASTPredict
implementASTSerializable
- Ex:
Chain.from_json(Chain([n(), e(), n()]).to_json())
- Ex:
- GFQL predicate
is_year_end
- GFQL in readme.md
- Refactor
ASTEdge
,ASTNode
field naming convention to match otherASTSerializable
s
- GFQL
e()
now aliasese_undirected
instead of the base classASTEdge
- Update readthedocs yml to work around ReadTheDocs v2 yml interpretation regressions
- Make README.md pass markdownlint
- Switch markdownlint docker channel to official and pin
- Neptune: Can now use PyGraphistry OpenCypher/BOLT bindings with Neptune, in addition to existing Gremlin bindings
- chain/hop:
is_in()
membership predicate,.chain([ n({'type': is_in(['a', 'b'])}) ])
- hop: optional df queries -
hop(..., source_node_query='...', edge_query='...', destination_node_query='...')
- chain: optional df queries:
chain([n(query='...')])
chain([e_forward(..., source_node_query='...', edge_query='...', destination_node_query='...')])
ASTPredicate
base class for filter matching- Additional predicates for hop and chain match expressions:
- categorical: is_in (example above), duplicated
- temporal: is_month_start, is_month_end, is_quarter_start, is_quarter_end, is_year_start, is_year_end, is_leap_year
- numeric: gt, lt, ge, le, eq, ne, between, isna, notna
- str: contains, startswith, endswith, match, isnumeric, isalpha, isdigit, islower, isupper, isspace, isalnum, isdecimal, istitle, isnull, notnull
- chain/hop: source_node_match was being mishandled when multiple node attributes exist
- chain: backwards validation pass was too permissive; add
target_wave_front
check` - hop: multi-hops with
source_node_match
specified was not checking intermediate hops - hop: multi-hops reverse validation was mishandling intermediate nodes
- compute logging no longer default-overrides level to DEBUG
- Docker tests support LOG_LEVEL
- refactor: move
is_in
,IsIn
implementations tographistry.ast.predicates
; old imports preserved IsIn
now implementsASTPredicate
- Refactor: use
setup_logger(__name__)
more consistently instead oflogging.getLogger(__name__)
- Refactor: drop unused imports
- Redo
setup_logger()
to activate formatted stream handler iff verbose / LOG_LEVEL
- hop/chain: new query and predicate forms
- hop/chain graph pattern mining tutorial: ipynb demo
- Neptune: Initial tutorial for using PyGraphistry with Amazon Neptune's OpenCypher/BOLT bindings
- igraph: support
compute_igraph('community_optimal_modularity')
- igraph:
compute_igraph('articulation_points')
labels nodes that are articulation points
- Type error in arrow uploader exception handler
- igraph: default coerce Graph-type node labels to strings, enabling plotting of g.compute_igraph('k_core')
- igraph: fix coercions when using numeric IDs that were confused by igraph swizzling
- dask: Fixed parsing error in hypergraph dask tests
- igraph: Ensure in compute_igraph tests that default mode results coerce to arrow tables
- igraph: Test chaining
- tests: mount source folders to enable dev iterations without rebuilding
- Memgraph: Add tutorial (#507 by https://github.com/karmenrabar)
- Guard against potential `requests`` null dereference in uploader error handling
- Add control
register(..., sso_opt_into_type='browser' | 'display' | None)
- Fix display of SSO URL
- Lint: Update flake8 in test
- AI: UMAP ignores reserved columns and fewer exceptions on low dimensionaltiy
- Lint: Dynamic type checks
- Adding Python 3.10, 3.11 to more of test matrix
- Unpin
setuptools
andpandas
- Fix tests that were breaking on pandas 2+
- igraph: Change dependency to new package name before old deprecates (#494)
- Security: Allow token register without org
- Security: Refresh logic
- AI: cuml OOM fix #482
- AI: moves public
g.g_dgl
from KGembed
method to private methodg._kg_dgl
- AI: moves public
g.DGL_graph
to private attributeg._dgl_graph
- AI: To return matrices during transform, set the flag:
X, y = g.transform(df, return_graph=False)
default behavior is ~g2 = g.transform(df)
returning aPlottable
instance.
- AI: all
transform_*
methods return graphistry Plottable instances, using an infer_graph method. To return matrices, set thereturn_graph=False
flag. - AI: adds
g.get_matrix(**kwargs)
general method to retrieve (sub)-feature/target matrices - AI: DBSCAN --
g.featurize().dbscan()
andg.umap().dbscan()
with options to use UMAP embedding, feature matrix, or subset of feature matrix viag.dbscan(cols=[...])
- AI: Demo cleanup using ModelDict & new features, refactoring demos using
dbscan
andtransform
methods. - Tests: dbscan tests
- AI: Easy import of featurization kwargs for
g.umap(**kwargs)
andg.featurize(**kwargs)
- AI:
g.get_features_by_cols
returns featurized submatrix withcol_part
in their columns - AI:
g.conditional_graph
andg.conditional_probs
assessing conditional probs and graph - AI Demos folder: OSINT, CYBER demos
- AI: Full text & semantic search (
g.search(..)
andg.search_graph(..).plot()
) - AI: Featurization: support for dataframe columns that are list of lists -> multilabel targets
set using
g.featurize(y=['list_of_lists_column'], multilabel=True,...)
- AI:
g.embed(..)
code for fast knowledge graph embedding (2-layer RGCN) and its usage for link scoring and prediction - AI: Exposes public methods
g.predict_links(..)
andg.predict_links_all()
- AI: automatic naming of graphistry objects during
g.search_graph(query)
->g._name = query
- AI: RGCN demos - Infosec Jupyterthon 2022, SSH anomaly detection
- GIB: Add missing import during group-in-a-box cudf layout of 0-degree nodes
- Tests: SSO login tests catch more unexpected exns
- Personal keys:
register(personal_key_id=..., personal_key_secret=...)
- SSO:
register()
(no user/pass),register(idp_name=...)
(org-specific IDP)
- Type errors
- AI:
umap(engine='cuml')
now supports older RAPIDS versions via knn fallback for edge creation. Also:"umap_learn"
, defaults to"auto"
prune_self_edges()
to drop any edges where the source and destination are the same
- Infra: Updated github actions versions and Ubuntu environment for publishing
- AI: full text & semantic search (
g.search(..)
andg.search_graph(..).plot()
) - Featurization: support for dataframe columns that are list of lists -> multilabel targets
set using
g.featurize(y=['list_of_lists_column'], multilabel=True,...)
Only supports single-column data targets
- Infra: Updated github actions
encode_axis()
now correctly sets axis- work around mypy mistyping operator & on pandas series
- Speed up
g.umap()
>100x by using cuML UMAP engine - Drop official support for Python 3.6 - its LTS security support stopped 9mo ago
- neo4j: v5 support - backwards-compatible changing derefs from id to element_id
- umap: Optional
engine
parameter (defaultcuml
) forUMAP()
- ipynb: UMAP purpose, functionality and parameter details, with general UMAP notebook planned in future (features folder)
- has_umap: removed as no longer necessary
- neo4j: v5 support (experimental)
- Infra: suppress igraph pandas FutureWarnings
- Infra: Remove heavy AI dependencies from
pip install graphistry[dev]
- igraph: Optional
use_vids
parameter (defaultFalse
) forto_igraph()
and its callers (layout_igraph
,compute_graph
) - igraph: add
coreness
andharmonic_centrality
tocompute_igraph
- igraph: CI errors around igraph
- igraph: Tolerate deprecation warning of
clustering
- Docs: Typos and updates - thanks @gadde5300 + @szhorvat !
- Speed up
import graphistry
10X+ by lazily importing AI dependencies. Use ofpygraphistry[ai]
features will still trigger slow upstream dependency initialization times upon first use.
- Docs: Update Labs references to Hub
group_in_a_box_layout()
: Remove verbose outputgroup_in_a_box_layout()
: Remove synthesized edge weight
- Types: Switch
materialize_nodes
engine param to explicitly usingEngine
typing (no change to untyped user code)
g.keep_nodes(List or Series)
g.group_in_a_box_layout(...)
: Both CPU (pandas/igraph) and (cudf/cugraph) versions, and various partitioning/layout/styling settings- Internal clientside Brewer palettes helper for categorical point coloring
- Infra: CI early fail on deeper lint
- Infra: Move Python 3.6 from core to minimal tests due to sklearn 1.0 incompatibility
- lint
- suppress known dgl bindings test type bug
_table_to_arrow()
forcudf
: Updated for RAPIDS 2022.02+ to handle deprecation ofcudf.DataFrame.hash_columns()
in favor of newcudf.DataFrame.hash_values()
materialize_nodes()
: Supportscudf
, materializing acudf.DataFrame
nodes table when._edges
is an instance ofcudf.DataFrame
to_cugraph()
,from_cugraph()
,compute_cugraph()
,layout_cugraph()
- docs: cugraph demo notebook
- Infra: Update GPU test env settings
materialize_nodes
: Return regular index
hypergraph()
in dask handles failing metadata type inference- tests: gpu env tweaks
- tests: umap logging was throwing warnings
g.transform()
g.transform_umap()
g.scale()
- Memoization on UMAP and Featurize calls
- Adds **kwargs and propagates them through to different function calls (featurize, umap, scale, etc)
- Final deprecation of
register(api=2)
protobuf/vgraph mode - also works around need for protobuf test upgrades
register(..., org_name='my_org')
: Optionally upload into an organizationg.privacy(mode='organization')
: Optionally limit sharing to within your organization
- docs:
org_name
inREADME.md
and sharing tutorial
compute_igraph()
layout_igraph()
scene_settings()
from_igraph
usesg._node
instead of'name'
in more cases
g.from_igraph(ig)
will use IDs (ex: strings) for src/dst values instead of igraph indexes
Major version bump due to breaking igraph change
- igraph handlers:
graphistry.from_igraph
,g.from_igraph
,g.to_igraph
- docs: README.md examples of using new igraph methods
- Deprecation warnings in old igraph methods:
g.graph(ig)
,igraph2pandas
,pandas2igraph
- Internal igraph handlers upgraded to use new igraph methods
network2igraph
andigraph2pandas
renamed output node ID column to_n_implicit
(constants.NODE
)
- Expose symbols for
.chain()
predicates as top-level: previousast
export was incorrect
Major version bump due to large dependency increases for kitchen-sink installs and overall sizeable new feature
- Use buildkit with pip install caching for test dockerfiles
- Graph AI branch: Autoencoding via dirty_cat and sentence_transformers (
g.featurize()
) - Graph AI branch: UMAP via umap_learn (
g.umap()
) - Graph AI branch: GNNs via DGL (
g.build_dgl_graph()
) g.reset_caches()
to clear upload and compute caches (last 100)- Central
setup_logger()
- Official Python 3.10 support
- Logging: Refactor to
setup_logger(__name__)
- hypergraph: use default logger instead of DEBUG
- `g.collapse(node='root_id', column='some_col', attribute='some_val')
- Avoid runtime import exn when on GPU-less systems with cudf/dask_cudf installed
- Docs:
readme.md
digest of compute methods
g.edges()
now takes an optional 4th named parameteredge
ID
Code that looks like g.edges(some_fn, None, None, some_arg)
should now be like g.edges(some_fn, None, None, None, some_arg)
- Similar new optional
edge
ID parameter ing.bind()
g.hop()
now takes optionalreturn_as_wave_front=False
, primarily for internal use bychain()
g.chain([...])
withgraphistry.ast.{n, e_forward, e_reverse, e_undirected}
- Node dictionary-based filtering:
g.filter_nodes_by_dict({"some": "value", "another": 2})
- Edge dictionary-based filtering:
g.filter_edges_by_dict({"some": "value", "another": 2})
- Hops support edge filtering:
g.hop(hops=2, edge_match={"type": "transaction"})
- Hops support pre-node filtering:
g.hop(hops=2, source_node_match={"type": "account"})
- Hops support post-node filtering:
g.hop(hops=2, destination_node_match={"type": "wallet"})
- Hops defaults to full graph if no initial nodes specified:
g.hop(hops=2, edge_match={"type": "transaction"})
- Horizontal and radial axis using
.encode_axis(rows=[...])
- Docs: Work around sphinx-doc/sphinx#10291
- Better implementation of
.tree_layout(...)
using Sugiyama; good for small/medium DAGs - Layout rotation method
.rotate(degree)
- Compute method
.hops(nodes, hops, to_fixed_point, direction)
- Infra:
test-cpu-local-minimum.sh
accepts params
- Docs: Point color encodings
- Unpin Networkx
- Docs: Removed deprecated
api=1
,api=2
registration calls (#280 by @pradkrish) - Docs: Fixed bug in honeypot nb (#279 by @pradkrish)
- Tests: Networkx test version sniffing
- Docs: Sharing control demos/more_examples/graphistry_features/sharing_tutorial.ipynb
- Feature: global
graphistry.privacy()
and compositionalPlotter.privacy()
- Docs: How to use
privacy()
- Docs: Start removing deprecated 1.0 API docs
- Fix: NetworkX 2.5+ support - accept minor version tags
- Fix: igraph
.plot()
arrow coercion syntax error (#257) - Fix: Lint duplicate import warning
- CI: Treat lint warnings as CI failures
- Infra: Add CI stage that installs and tests with minimal core deps (#254)
- Feature: Compute methods
materialize_nodes
,get_degrees
,drop_nodes
,get_topological_levels
- Feature: Layout methods
tree_layout
,layout_settings
- Docs: New compute and layout methods
- Feature:
g.fetch_edges()
for neptune/gremlin edge attributes
- Fix:
g.fetch_nodes()
for neptune/gremlin node attrbutes
- Docs: Updated demos/for_analysis.ipynb to
api=3
- Fix: Gremlin (Neptune) connector deduplicates nodes/edges
- Feature: Gremlin connector (GraphSONSerializersV2d0)
- Feature: Cosmos connector
- Feature: Neptune connector
- Feature: Chained composition operators:
g.pipe((lambda g, a1, ...: g2), a1, ...)
g.edges((lambda g, a1, ...: df), None, None, a1, ...)
g.nodes((lambda g, a1, ...: df), None, a1, ...)
- Feature: plotter::infer_labels: Guess node label names when not set, instead of defaulting to node_id. Runs during plots.
- Infra: Jupyter notebook:
cd docker && docker-compose build jupyter && docker-compose up jupyter
- Docs: Neptune, Cosmos, chained composition
- Refactor: Split out PlotterBase, interface Plottable
- Fix: Plotter has
hypergraph()
- Docs: security.md
- Hypergraphs - detect and handle mismatching types across partitions
- Infra: Speedup testing containers via incrementalization and docker settings
- Infra: Update testing container base builds
- Feature: Hypergraphs in dask, dask_cudf modes. Mixed nan support. (#225)
- Feature: Dask/dask_cuda frames can be passed in, which will be .computed(), memoized, and converted to arrow (#225)
- Infra: Test env var controls - WITH_LINT=1, WITH_TYPECHECK=1, WITH_BUILD=1 (#225)
- Docs: Inline hypergraph examples (#225)
- CI: Disable seccomp during test (docker perf) (#225)
- Feature: cudf mode for hypergraph (#224)
- Feature: pandas mode for hypergraph uses all-vectorized operations (#224)
- Infra: Engine class for picking dataframe engine - pandas/cudf/dask/dask_cudf (#224)
- CI: mypy type checking (#222)
- CI: GPU test harness (#223)
- Hypergraph: Uses new pandas/cudf implementations (#224)
- Infra: Issue templates for bugs and feature requests
- Docs: Overhaul Sphinx docs - Update, clean all warnings, add to CI, reject commits that fail
- Docs: Setup.py (pypi) now includes full README.md
- Docs: Added ARCHITECTURE, CONTRIBUTE, and expanded DEVELOP
- Garden: DRY for CI + local dev via shared bin/ scripts
- Docker: Downgrade local dev 3.7 -> 3.6 to more quickly catch minimum version errors
- CI: Now tests building docs (fail on warnings), pypi wheels distro, and neo4j connector
- Changes in setup.py extras_require: 'all' installs more
- Docs: ARCHITECTURE.md and CONTRIBUTE.md
- Quieted memoization fail warning
- CI: Removed TravisCI in favor of GHA
- CD: GHA now handles PyPI publish on tag push
- Docs: Readme install clarifies Python 3.6+
- Docs: Update DEVELOP.md dev flow
- Friendlier error message for calling .cypher(...) without setting BOLT auth/driver (#204)
- CI: Run containerized neo4j connector tests
- Infrastructure: Set Python 3.9 support metadata
- Memoization: When memoize hashes throw exceptions, emit warning and fallback to unmemoized (b7a25c74e)
- Friendlier error message for api=1, 2 server non-json responses (#187)
- CI: Moved to GitHub Actions for CI + optional manual publish
- CI: Added Python 3.9 to test matrix
- Infrastructure: Upgraded Versioneer to 0.19
- Infrastructure: Fewer warnings and enforce flake8 CI checks
- None known; many small changes to fix warnings so version bump out of caution
- File API: Enable via
.plot(as_files=True)
. By default, auto-skips file re-uploads (disable via.plot(memoize=False)
) for tables with same hash as those already uploaded in the same session. Use with.register(api=3)
clients on Graphistry2.34
+ servers. More details at (#195) . - Dev: More docs and logging as part of #195
- Auth service account docs in README.md (12.2.2020)
- Examples for icons, badges, and new node/edge bindings
- graph-app-kit links
- Slack link
- Python test matrix: Removed 3.9
- Propagate misformatted etl1/2 server errors
- Warnings: Standardizing on Python's warnings.warn
- Neo4j: Improve handling of empty query results (#178)
- Icons: Add new as_text, blend_mode, border, and style options (Graphistry 2.32+)
- Badges: Add new badge encodings (Graphistry 2.32.+)
- Python 3.8, 3.9 in test matrix
- New binding shortcuts
g.nodes(df, col)
andg.nodes(df, src_col, dst_col)
- Python 2.7: Removed future (Python 2.7 has already been EOL so not breaking)
- Redid ipython detection
- Imports: Refactoring for more expected style
- Testing: Fixed most warnings in preperation for treating them as errors
- Testing: Integration tests against self-contained neo4j instance
- Chainable methods
.addStyle()
and.style()
inapi=3
for controlling foreground, background, logo, and page metadata. Requires Graphistry 2.31.10+ 08eddb8 - Chainable methods
.encode_[point|edge]_[color|icon|size]()
for more powerful complex encodings, and underlying generic handler__encode()
. Requires Graphistry 2.31.10+ f370ca8 - More usage examples in README.md
- Split
ArrowLoader::*encoding*
methods to*binding*
vs.*encoding*
ones to more precisely reflect the protocol. Not considered breaking as an internal method.
- Neo4j 4 temporal and spatial type support - #172
- CHANGELOG.md
- Removed deprecated docker test harness in favor of
docker/
- #172