Releases: stellargraph/stellargraph
Release 1.2.1
StellarGraph is a Python library for machine learning on graphs and networks. It offers state-of-the-art algorithms for graph machine learning, making it easy to discover patterns and answer questions about graph-structured data.
Get started with StellarGraph's newest graph machine learning features with pip install stellargraph
.
This release is a small bug fix release on top of 1.2.0.
Bug fixes and other changes
- Update the URLs of some datasets (
Cora
,PubMedDiabetes
,CiteSeer
) for upstream changes #1738, #1759 - Add two missing layers to the
stellargraph.custom_keras_layers
dictionary #1757 - Experimental changes: rename
RotHEScoring
toRotHEScore
#1756 - DevOps:
Release 1.2.0
StellarGraph is a Python library for machine learning on graphs and networks. It offers state-of-the-art algorithms for graph machine learning, making it easy to discover patterns and answer questions about graph-structured data.
Get started with StellarGraph's newest graph machine learning features with pip install stellargraph
.
Jump in to this release, with the new and improved demos and examples:
- Comparison of link prediction with random walks based node embedding
- Unsupervised training of a Cluster-GCN model with Deep Graph Infomax
Major features and improvements
- Better Windows support: StellarGraph's existing ability to run on Windows has been improved, with all tests running on CI (#1696) and several small fixes (#1671, #1704, #1705).
- Edge weights are supported in GraphSAGE (#1667) and Watch Your Step (#1604). This is in addition to the existing support for edge weights in GCN, GAT, APPNP, PPNP, RGCN, GCN graph classification, DeepGraphCNN and Node2Vec sampling.
- Better and more demonstration notebooks and documentation to make the library more accessible to new and existing users:
- A demo notebook for a comparison of link prediction with random walks based node embedding, showing Node2Vec, Attri2Vec, GraphSAGE and GCN #1658
- The demo notebook for unsupervised training with Deep Graph Infomax has been expanded with more explanation and links #1257
- The documentation for models, generators and other elements now has many more links to other relevant items in a "See also" box, making it easier to fit pieces together (examples:
GraphSAGE
,GraphSAGENodeGenerator
,BiasedRandomWalk
) #1718
- The Cluster-GCN training procedure supports unsupervised training via Deep Graph Infomax; this allows for scalable training of GCN, APPNP and GAT models, and includes connecting to Neo4j for large graphs demo (#1257)
KGTripleGenerator
now supports the self-adversarial negative sampling training procedure for knowledge graph algorithms (from RotatE), viagenerator.flow(..., sample_strategy="self-adversarial")
docs
Deprecations
- The
ClusterGCN
model has been replaced with theGCN
class. In the previous 1.1.0 release, GCN, APPNP and GAT were generalised to support the Cluster-GCN training procedure viaClusterNodeGenerator
(which includes Neo4j support). TheClusterGCN
model is now redundant and thus is deprecated: however, it still works without behaviour change.
Experimental features
Some new algorithms and features are still under active development, and are available as an experimental preview. However, they may not be easy to use: their documentation or testing may be incomplete, and they may change dramatically from release to release. The experimental status is noted in the documentation and at runtime via prominent warnings.
RotE
,RotH
: knowledge graph link prediction algorithms that combine TransE and RotatE in Euclidean or hyperbolic space, respectively #1539
Bug fixes and other changes
- There are now tests for saving and loading a Keras
Model
constructed every model in StellarGraph #1676. This includes fixes for some models (#1677, #1682). Known issues: sparse models such as GCN and RGCN (see #1251 for more info and a work-around usingtf-nightly
), experimental GCN-LSTM (#1681). - Various documentation, demo and error message fixes and improvements: better internal linking #1404, automated spell checking #1583, #1663, #1665, #1684, improved rendering #1722 including a better sidebar #1512, #1729, #1730
- DevOps changes:
Release 1.1.0
StellarGraph is a Python library for machine learning on graphs and networks. It offers state-of-the-art algorithms for graph machine learning, making it easy to discover patterns and answer questions about graph-structured data.
Get started with StellarGraph's newest graph machine learning features with pip install stellargraph
.
Jump in to this release, with the new and improved demos and examples:
- Neo4j graph database support: Cluster-GCN, GraphSAGE, all demos
- Semi-supervised node classification via GCN, Deep Graph Infomax and fine-tuning
- Loading data into StellarGraph from NumPy
- Link prediction with Metapath2Vec
- Unsupervised graph classification/representation learning via distances
- RGCN section of Node representation learning with Deep Graph Infomax
- Node2Vec with StellarGraph components: representation learning, node classification
- Expanded Attri2Vec explanation: representation learning, node classification, link prediction
Major features and improvements
- Support for the Neo4j graph database has been significantly improved:
- There is now a
Neo4jStellarGraph
class that packages up a connection to a Neo4j instance, and allows it to be used for machine learning algorithms including the existing Neo4j and GraphSAGE functionality demo, #1595, #1598. - The
ClusterNodeGenerator
class now supportsNeo4jStellarGraph
in addition to the in-memoryStellarGraph
class, allowing it to be used to train models like GCN and GAT with data stored entirely in Neo4j demo (#1561, #1594, #1613)
- There is now a
- Better and more demonstration notebooks and documentation to make the library more accessible to new and existing users:
- There is now a glossary that explains some terms specific to graphs, machine learning and graph machine learning #1570
- A new demo notebook for semi-supervised node classification using Deep Graph Infomax and GCN #1587
- A new demo notebook for link prediction using the Metapath2Vec algorithm #1614
- New algorithms:
- Unsupervised graph representation learning demo (#1626)
- Unsupervised RGCN with Deep Graph Infomax demo (#1258)
- Native Node2Vec using Tensorflow Keras, not the gensim library, demo of representation learning, demo of node classification (#536, #1566)
- The
ClusterNodeGenerator
class can be used to train GCN, GAT, APPNP and PPNP models in addition to the ClusterGCN model #1585
- The
StellarGraph
class continues to get smaller, faster and more flexible:- Node features can now be specified as NumPy arrays or the newly added thin
IndexedArray
wrapper, which does no copies and has minimal runtime overhead demo (#1535, #1556, #1599). They can also now be multidimensional for each node #1561. - Edges can now have features, taken as any extra/unused columns in the input DataFrames demo (#1574)
- Adjacency lists used for random walks and GraphSAGE/HinSAGE are constructed with NumPy and stored as contiguous arrays instead of dictionaries, cutting the time and memory or construction by an order of magnitude #1296
- The peak memory usage of construction and adjacency list building is now monitored to ensure that there are not large spikes for large graphs, that exceed available memory #1546. This peak usage has thus been optimised: #1551,
- Other optimisations: the
edge_arrays
,neighbor_arrays
,in_node_arrays
andout_node_arrays
methods have been added, reducing time and memory overhead by leaving data as its underlying NumPy array #1253; thenode_type
method now supports multiple nodes as input, making algorithms like HinSAGE and Metapath2Vec much faster #1452; the default edge weight of 1 no longer consumes significant memory #1610.
- Node features can now be specified as NumPy arrays or the newly added thin
- Overall performance and memory usage improvements since 1.0.0, in numbers:
- A reddit graph has 233 thousand nodes and 11.6 million edges:
- construction without node features is now 2.3× faster, uses 31% less memory and has a memory peak 57% smaller.
- construction with node features from NumPy arrays is 6.8× faster, uses 6.5% less memory overall and 85% less new memory (the majority of the memory is shared with the original NumPy arrays), and has a memory peak (above the raw data set) 70% smaller, compared to Pandas DataFrames in 1.0.0.
- adjacency lists are 4.7-5.0× faster to construct, use 28% less memory and have a memory peak 60% smaller.
- Various random walkers are faster:
BiasedRandomWalk
is up to 30× faster with weights and 5× faster without weights on MovieLens and up to 100× faster on some synthetic datasets,UniformRandomMetapathWalk
is up to 17× faster (on MovieLens),UniformRandomWalk
is up to 1.4× (on MovieLens).
- A reddit graph has 233 thousand nodes and 11.6 million edges:
- Tensorflow 2.2 and thus Python 3.8 are now supported #1278
Experimental features
Some new algorithms and features are still under active development, and are available as an experimental preview. However, they may not be easy to use: their documentation or testing may be incomplete, and they may change dramatically from release to release. The experimental status is noted in the documentation and at runtime via prominent warnings.
RotatE
: a knowledge graph link prediction algorithm that uses complex rotations (|z| = 1
) to encode relations #1522GCN_LSTM
(renamed fromGraphConvolutionLSTM
): time series prediction on spatio-temporal data. It is still experimental, but has been improved since last release:- [the
SlidingFeaturesNodeGenerator
class][sliding-1.1.0] has been added to yield data appropriate for the model, straight from aStellarGraph
instance containing time series data as node features #1564 - the hidden graph convolution layers can now have a custom output size #1555
- the model now supports multivariate input and output, incl...
- [the
Release 1.0.0
This 1.0 release of StellarGraph is the culmination of three years of active research and engineering to deliver an open-source, user-friendly library for machine learning (ML) on graphs and networks.
Jump in to this release, with the new demos and examples:
- More helpful indexing and guidance for demos in our API documentation
- Loading from Neo4j
- More explanatory Node2Vec link prediction
- Unsupervised
GraphSAGE
andHinSAGE
viaDeepGraphInfomax
- Graph classification with
GCNSupervisedGraphClassification
and withDeepGraphCNN
- Time series prediction using spatial information, using
GraphConvolutionLSTM
(experimental)
Major features and improvements
- Better demonstration notebooks and documentation to make the library more accessible to new and existing users:
- Notebooks are now published in the API documentation, for better & faster rendering and more convenient access #1279 #1433 #1448
- The demos indices and READMEs now contain more guidance and explanation to make it easier to find a relevant example #1200
- Several demos have been added or rewritten: loading data from Neo4j #1184, link prediction using Node2Vec #1190, graph classification with GCN, graph classification with DGCNN
- Notebooks now detect if they're being used with an incorrect version of the StellarGraph library, eliminating confusion about version mismatches #1242
- Notebooks are easier to download, both individually via a button on each in the API documentation #1460 and in bulk #1377 #1459
- Notebooks have been re-arranged and renamed to be more consistent and easier to find #1471
- New algorithms:
GCNSupervisedGraphClassification
: supervised graph classification model based on Graph Convolutional layers (GCN) #929, demo.DeepGraphCNN
(DGCNN): supervised graph classification using a stack of graph convolutional layers followed bySortPooling
, and standard convolutional and pooling (such asConv1D
andMaxPool1D
) #1212 #1265, demoSortPooling
layer: the node pooling layer introduced in Zhang et al #1210
DeepGraphInfomax
can be used to train almost any model in an unsupervised way, via thecorrupt_index_groups
parameter toCorruptedGenerator
#1243, demo. Additionally, many algorithms provide defaults and so can be used withDeepGraphInfomax
without specifying this parameter:UnsupervisedSampler
supports awalker
parameter to use other random walking algorithms such asBiasedRandomWalk
, in addition to the defaultUniformRandomWalk
. #1187- The
StellarGraph
class is now smaller, faster and easier to construct and use:- The
StellarGraph(..., edge_type_column=...)
parameter can be used to construct a heterogeneous graph from a single flatDataFrame
, containing a column of the edge types #1284. This avoids the need to build separateDataFrame
s for each type, and is significantly faster when there are many types. Usingedge_type_column
gives a 2.6× speedup for loading thestellargraph.datasets.FB15k
dataset (with almost 600 thousand edges across 1345 types). StellarGraph
's internal cache of node adjacencies is now computed lazily #1291 and takes into account whether the graph is directed or not #1463, and they now use the smallest integer type they can #1289StellarGraph
's internal list of source and target nodes are now stored using integer "ilocs" #1267, reducing memory use and making some functionality significantly faster #1444 #1446)- Functions like
graph.node_features()
no longer needsnode_type
specified ifgraph
has only one node type (this includes classes likeHinSAGENodeGenerator
, which no longer needshead_node_type
if there is only one node type) #1375
- The
- Overall performance and memory usage improvements since 0.11, in numbers:
- The FB15k graph has 15 thousand nodes and 483 thousand edges: it is now 7× faster and 4× smaller to construct (without adjacency lists). It is still about 2× smaller when directed or undirected adjacency lists are computed.
- Directed adjacency matrix construction is up to 2× faster
- Various samplers and random walkers are faster:
HinSAGENodeGenerator
is 3× faster (onMovieLens
),Attri2VecNodeGenerator
is 4× faster (onCiteSeer
), weightedBiasedRandomWalk
is up to 3× faster,UniformRandomMetapathWalk
is up to 7× faster
Breaking changes
- The
stellargraph/stellargraph
docker image wasn't being published in an optimal way, so we have stopped updating it for now #1455 - Edge weights are now validated to be numeric when creating a
StellarGraph
. Previously edge weights could be any type, but all algorithms that use them would fail with non-numeric types. #1191 - Full batch layers no longer support an "output indices" tensor to filter the output rows to a selected set of nodes #1204 (this does not affect models like
GCN
, only the layers within them:APPNPPropagationLayer
,ClusterGraphConvolution
,GraphConvolution
,GraphAttention
,GraphAttentionSparse
,PPNPPropagationLayer
,RelationalGraphConvolution
). Migration: post-process the output usingtf.gather
manually or the newsg.layer.misc.GatherIndices
layer. GraphConvolution
has been generalised to work with batch size > 1, subsuming the functionality of the now-deprecatedClusterGraphConvolution
(andGraphClassificationConvolution
) #1205. Migration: replacestellargraph.layer.ClusterGraphConvolution
withstellargraph.layer.GraphConvolution
.BiasedRandomWalk
now takes multi-edges into consideration instead of collapsing them when traversing the graph. It previously required all multi-edges had to same weight and only counted one of them when considering where to walk, but now a multi-edge is equivalent to having an edge whose weight is the sum of the weights of all edges in the multi-edge #1444
Experimental features
Some new algorithms and features are still under active development, and are available as an experimental preview. However, they may not be easy to use: their documentation or testing may be incomplete, and they may change dramatically from release to release. The experimental status is noted in the documentation and at runtime via prominent warnings.
GraphConvolutionLSTM
: time series prediction on spatio-temporal data, combining GCN with a LSTM model to augment the conventional time-series model with information from nearby data points #1085, demo
Bug fixes and other changes
- Random walk classes like `UniformR...
Release 1.0.0rc1
This is the first release candidate for StellarGraph 1.0. The 1.0 release will be the culmination of 2 years of activate development, and this release candidate is the first milestone for that release.
Jump in to this release, with the new demos and examples:
- More helpful indexing and guidance in demo READMEs
- Loading from Neo4j
- More explanatory Node2Vec link prediction
- Unsupervised
GraphSAGE
andHinSAGE
viaDeepGraphInfomax
- Graph classification with
GCNSupervisedGraphClassification
- Time series prediction using spatial information, using
GraphConvolutionLSTM
(experimental)
Major features and improvements
- Better demonstration notebooks and documentation to make the library more accessible to new and existing users:
- The demos READMEs now contain more guidance and explanation to make it easier to find a relevant example #1200
- A demo for loading data from Neo4j has been added #1184
- The demo for link prediction using Node2Vec has been rewritten to be clearer #1190
- Notebooks are now included in the API documentation, for more convenient access #1279
- Notebooks now detect if they're being used with an incorrect version of the StellarGraph library, elimanting confusion about version mismatches #1242
- New algorithms:
DeepGraphInfomax
can be used to train almost any model in an unsupervised way, via thecorrupt_index_groups
parameter toCorruptedGenerator
#1243, demo. Additionally, many algorithms provide defaults and so can be used withDeepGraphInfomax
without specifying this parameter:UnsupervisedSampler
supports awalker
parameter to use other random walking algorithms such asBiasedRandomWalk
, in addition to the defaultUniformRandomWalk
. #1187- The
StellarGraph
class is now smaller, faster and easier to construct:- The
StellarGraph(..., edge_type_column=...)
parameter can be used to construct a heterogeneous graph from a single flatDataFrame
, containing a column of the edge types #1284. This avoids the need to build separateDataFrame
s for each type, and is significantly faster when there are many types. Usingedge_type_column
gives a 2.6× speedup for loading thestellargraph.datasets.FB15k
dataset (with almost 600 thousand edges across 1345 types). StellarGraph
's internal cache of node adjacencies now uses the smallest integer type it can #1289. This reduces memory use by 31% on theFB15k
dataset, and 36% on a reddit dataset (with 11.6 million edges).
- The
Breaking changes
- Edge weights are now validated to be numeric when creating a
StellarGraph
, previously edge weights could be any type, but all algorithms that use them would fail. #1191 - Full batch layers no longer support an "output indices" tensor to filter the output rows to a selected set of nodes #1204 (this does not affect models like
GCN
, only the layers within them:APPNPPropagationLayer
,ClusterGraphConvolution
,GraphConvolution
,GraphAttention
,GraphAttentionSparse
,PPNPPropagationLayer
,RelationalGraphConvolution
). Migration: post-process the output usingtf.gather
manually or the newsg.layer.misc.GatherIndices
layer. GraphConvolution
has been generalised to work with batch size > 1, subsuming the functionality of the now-deprecatedClusterGraphConvolution
(andGraphClassificationConvolution
) #1205. Migration: replacestellargraph.layer.ClusterGraphConvolution
withstellargraph.layer.GraphConvolution
.
Experimental features
Some new algorithms and features are still under active development, and are available as an experimental preview. However, they may not be easy to use: their documentation or testing may be incomplete, and they may change dramatically from release to release. The experimental status is noted in the documentation and at runtime via prominent warnings.
SortPooling
layer: the node pooling layer introduced in Zhang et al #1210DeepGraphConvolutionalNeuralNetwork
(DGCNN): supervised graph classification using a stack of graph convolutional layers followed bySortPooling
, and standard convolutional and pooling (such asConv1D
andMaxPool1D
) #1212 #1265GraphConvolutionLSTM
: time series prediction on spatio-temporal data, combining GCN with a LSTM model to augment the conventional time-series model with information from nearby data points #1085, demo
Bug fixes and other changes
- Random walk classes like
UniformRandomWalk
andBiasedRandomWalk
can have their hyperparameters set on construction, in addition to in each call torun
#1179 - Node feature sampling was made ~4× faster by ensuring a better data layout, this makes some configurations of
GraphSAGE
(andHinSAGE
) noticably faster #1225 - The
PROTEINS
dataset has been added tostellargraph.datasets
, for graph classification #1282 - The
BlogCatalog3
dataset can now be successfully downloaded again #1283 - Knowledge graph model evaluation via
rank_edges_against_all_nodes
now defaults to therandom
strategy for breaking ties, and supportstop
(previous default) andbottom
as alternatives #1223 - Creating a
RelationalFullBatchNodeGenerator
is now significantly faster and requires much less memory (18× speedup and 560× smaller for thestellargraph.datasets.AIFB
dataset) #1274 StellarGraph.info
now shows a summary of the edge weights for each edge type #1240- Various documentation, demo and error message fixes and improvements: #1141, #1219, #1246, #1260, #1266
- DevOps changes:
Release 0.11.1
StellarGraph is a Python library for machine learning on graphs and networks. It offers state-of-the-art algorithms for graph machine learning, making it easy to discover patterns and answer questions about graph-structured data.
Get started with StellarGraph's newest graph machine learning features with pip install stellargraph
.
This bugfix release contains the same code as 0.11.0, and just fixes the metadata in the Anaconda package so that it can be installed successfully.
Bug fixes and other changes
- The Conda package for StellarGraph has been updated to require TensorFlow 2.1, as TensorFlow 2.0 is no longer supported. As a result, StellarGraph will currently install via Conda on Linux and Windows - Mac support is waiting on the Tensorflow 2.1 osx-64 release to Conda. #1165
Release 0.11.0
StellarGraph is a Python library for machine learning on graphs and networks. It offers state-of-the-art algorithms for graph machine learning, making it easy to discover patterns and answer questions about graph-structured data.
Get started with StellarGraph's newest graph machine learning features with pip install stellargraph
.
Major features and improvements
- The onboarding/getting-started process has been optimised and improved:
- The README has been rewritten to highlight our numerous demos, and how to get help #1081
- Example Jupyter notebooks can now be run directly in Google Colab and Binder, providing an easy way to get started with StellarGraph - simply click the and badges within each notebook. #1119.
- The new
demos/basics
directory contains two notebooks demonstrating how to construct aStellarGraph
object from Pandas, and from NetworkX #1074 - The GCN node classification demo now has more explanation, to serve as an introduction to graph machine learning using StellarGraph #1125
- New algorithms:
- Watch Your Step: computes node embeddings by simulating the effect of random walks, rather than doing them. #750.
- Deep Graph Infomax: performs unsupervised node representation learning #978.
- Temporal Random Walks (Continuous-Time Dynamic Network Embeddings): random walks that respect the time that each edge occurred (stored as edge weights) #1120.
- ComplEx: computes multiplicative complex-number embeddings for entities and relationships (edge types) in knowledge graphs, which can be used for link prediction. #901 #1080
- DistMult: computes multiplicative real-number embeddings for entities and relationships (edge types) in knowledge graphs, which can be used for link prediction. #755 #865 #1136
Breaking changes
- StellarGraph now requires TensorFlow 2.1 or greater, TensorFlow 2.0 is no longer supported #1008
- The legacy constructor using NetworkX graphs has been deprecated #1027. Migration: replace
StellarGraph(some_networkx_graph)
withStellarGraph.from_networkx(some_networkx_graph)
, and similarly forStellarDiGraph
. - The
build
method on model classes (such asGCN
) has been renamed toin_out_tensors
#1140. Migration: replacemodel.build()
withmodel.in_out_tensors()
. - The
node_model
andlink_model
methods on model classes has been replaced byin_out_tensors
#1140 (see that PR for the exact list of types). Migration: replacemodel.node_model()
withmodel.in_out_tensors()
ormodel.in_out_tensors(multiplicity=1)
, andmodel.node_model()
withmodel.in_out_tensors()
ormodel.in_out_tensors(multiplicity=2)
. - Re-exports of calibration and ensembling functionality from the top-level of the
stellargraph
module were deprecated, in favour of importing from thestellargraph.calibration
orstellargraph.ensemble
submodules directly #1107. Migration: replace uses ofstellargraph.Ensemble
withstellargraph.ensemble.Ensemble
, and similarly for the other names (see #1107 for all replacements). StellarGraph.to_networkx
parameters now useattr
to refer to NetworkX attributes, notname
orlabel
#973. Migration: for any named parameters ingraph.to_networkx(...)
, changenode_type_name=...
tonode_type_attr=...
and similarlyedge_type_name
toedge_type_attr
,edge_weight_label
toedge_weight_attr
,feature_name
tofeature_attr
.StellarGraph.nodes_of_type
is deprecated in favour of thenodes
method #1111. Migration: replacesome_graph.nodes_of_type(some_type)
withsome_graph.nodes(node_type=some_type)
.StellarGraph.info
parametersshow_attributes
andsample
were deprecated #1110- Some more layers and models had many parameters move from
**kwargs
to real arguments:Attri2Vec
(#1128),ClusterGCN
(#1129),GraphAttention
&GAT
(#1130),GraphSAGE
& its aggregators (#1142),HinSAGE
& its aggregators (#1143),RelationalGraphConvolution
&RGCN
(#1148). Invalid (e.g. incorrectly spelled) arguments would have been ignored previously, but now may fail with aTypeError
; to fix, remove or correct the arguments. - The
method="chebyshev"
option toFullBatchNodeGenerator
,FullBatchLinkGenerator
andGCN_Aadj_feats_op
has been removed for now, because it needed significant revision to be correctly implemented #1028 - The
fit_generator
,evaluate_generator
andpredict_generator
methods onEnsemble
andBaggingEnsemble
have been renamed tofit
,evaluate
andpredict
, to match the deprecation in TensorFlow 2.1 of thetensorflow.keras.Model
methods of the same name #1065. Migration: remove the_generator
suffix on these methods. - The
default_model
method onAttri2Vec
,GraphSAGE
andHinSAGE
has been deprecated, in favour ofin_out_tensors
#1145. Migration: replacemodel.default_model()
withmodel.in_out_tensors()
.
Experimental features
Some new algorithms and features are still under active development, and are available as an experimental preview. However, they may not be easy to use: their documentation or testing may be incomplete, and they may change dramatically from release to release. The experimental status is noted in the documentation and at runtime via prominent warnings.
- GCNSupervisedGraphClassification: supervised graph classification model based on Graph Convolutional layers (GCN) #929.
Bug fixes and other changes
StellarGraph.to_adjacency_matrix
is at least 15× faster on undirected graphs #932ClusterNodeGenerator
is now noticably faster, which makes training and predicting with aClusterGCN
model faster #1095. On a random graph with 1000 nodes and 5000 edges and 10 clusters, iterating over an epoch withq=1
(each clusters individually) is 2× faster, and is even faster for largerq
. The model in the Cluster-GCN demo notebook using Cora trains 2× faster overall.- The
node_features=...
parameter toStellarGraph.from_networkx
now only needs to mention the node types that have features, when passing a dictionary of Pandas DataFrames. Node types that aren't mentioned will automatically have no features (zero-length feature vectors). #1082 - A
subgraph
method was added toStellarGraph
for computing a node-induced subgraph #958 - A
connected_components
method was added toStellarGraph
for computing the nodes involved in each connected component in aStellarGraph
#958 - The
info
method onStellarGraph
now shows only 20 node and edge types by default to be more useful for graphs with many types #993. This behaviour can be customized with thetruncate=...
parameter. - The
info
method onStellarGraph
now shows information about the size and type of each node type's feature vectors #979 - The
EdgeSplitter
class supportsStellarGraph
input (and will outputStellarGraph
s in this case), in addition to NetworkX graphs #1032 - The
Attri2Vec
model class stores its weights statefully, so they are shared between all tensors computed bybuild
#1101 - The
GCN
model defaults for some parameters now match theGraphConvolution
layer's defaults: specificallykernel_initializer
(glorot_uniform
) andbias_initializer
(zeros
) #1147 - The
datasets
subm...
Release 0.10.0
Major features and improvements
-
The
StellarGraph
andStellarDiGraph
classes are now backed by NumPy and Pandas #752. TheStellarGraph(...)
andStellarDiGraph(...)
constructors now consume Pandas DataFrames representing node features and the edge list. This significantly reduces the memory use and construction time for theseStellarGraph
objects.The following table shows some measurements of the memory use of
g = StellarGraph(...)
, and the time required for that constructor call, for several real-world datasets of different sizes, for both the old form backed by NetworkX code and the new form backed by NumPy and Pandas (both old and new store node features similarly, using 2D NumPy arrays, so the measurements in this table include only graph structure: the edges and nodes themselves):dataset nodes edges size old (MiB) size new (MiB) size change time old (s) time new (s) time change Cora 2708 5429 4.1 1.3 -69% 0.069 0.034 -50% FB15k 14951 592213 148 28 -81% 5.5 1.2 -77% Reddit 231443 11606919 6611 493 -93% 154 33 -82% The old backend has been removed, and conversion from a NetworkX graph should be performed via the
StellarGraph.from_networkx
function (the existing formStellarGraph(networkx_graph)
is supported in this release but is deprecated, and may be removed in a future release). -
More detailed information about Heterogeneous GraphSAGE (HinSAGE) has been added to StellarGraph's readthedocs documentation #839.
-
New algorithms:
Breaking changes
- Some layers and models had many parameters move from
**kwargs
to real arguments:GraphConvolution
,GCN
. #801 Invalid (e.g. incorrectly spelled) arguments would have been ignored previously, but now may fail with aTypeError
; to fix, remove or correct the arguments. - The
stellargraph.data.load_dataset_BlogCatalog3
function has been replaced by theload
method onstellargraph.datasets.BlogCatalog3
#888. Migration: replaceload_dataset_BlogCatalog3(location)
withBlogCatalog3().load()
; code required to find the location or download the dataset can be removed, asload
now does this automatically. stellargraph.data.train_test_val_split
andstellargraph.data.NodeSplitter
have been removed. #887 Migration: this functionality should be replaced withpandas
andsklearn
(for instance,sklearn.model_selection.train_test_split
).- Most of the submodules in
stellargraph.utils
have been moved to top-level modules:stellargraph.calibration
,stellargraph.ensemble
,stellargraph.losses
andstellargraph.interpretability
#938. Imports from the old location are now deprecated, and may stop working in future releases. See the linked issue for the full list of changes.
Experimental features
Some new algorithms and features are still under active development, and are available as an experimental preview. However, they may not be easy to use: their documentation or testing may be incomplete, and they may change dramatically from release to release. The experimental status is noted in the documentation and at runtime via prominent warnings.
- Temporal Random Walks: random walks that respect the time that each edge occurred (stored as edge weights) #787. The implementation does not have an example or thorough testing and documentation.
- Watch Your Step: computes node embeddings by simulating the effect of random walks, rather than doing them. #750. The implementation is not fully tested.
- ComplEx: computes embeddings for nodes and edge types in knowledge graphs, and use these to perform link prediction #756. The implementation hasn't been validated to match the paper.
- Neo4j connector: the GraphSAGE algorithm can execute doing neighbourhood sampling in a Neo4j database, so that the edges of a graph do not have to fit entirely into memory #799. The implementation is not automatically tested, and doesn't support functionality like loading node feature vectors from Neo4j.
Bug fixes and other changes
- StellarGraph now supports TensorFlow 2.1, which includes GPU support by default: #875
- Demos now focus on Jupyter notebooks, and demo scripts that duplicate notebooks have been removed: #889
- The following algorithms are now reproducible:
- Supervised GraphSAGE Node Attribute Inference #844
- Randomness can be more easily controlled using
stellargraph.random.set_seed
#806 StellarGraph.edges()
can return edge weights as a separate NumPy array withinclude_edge_weights=True
#754StellarGraph.to_networkx
supports ignoring node features (and thus being a little more efficient) withfeature_name=None
#841StellarGraph.to_adjacency_matrix
now ignores edge weights (that is, defaults every weight to1
) by default, unlessweighted=True
is specified #857stellargraph.utils.plot_history
visualises the model training history as a plot for each metric (such as loss) #902- the saliency maps/interpretability code has been refactored to have more sharing as well as to make it cleaner and easier to extend #855
- DevOps changes:
Release 0.9.0
Major features and improvements
- StellarGraph is now available as a conda package on Anaconda Cloud #516
- New algorithms:
- Cluster-GCN: an extension of GCN that can be trained using SGD, with demo #487
- Relational-GCN (RGCN): a generalisation of GCN to relational/multi edge type graphs, with demo #490
- Link prediction for full-batch models:
FullBatchLinkGenerator
allows doing link prediction with algorithms like GCN, GAT, APPNP and PPNP #543
- Unsupervised GraphSAGE has now been updated and tested for reproducibility. Ensuring all seeds are set, running the same pipeline should give reproducible embeddings. #620
- A
datasets
subpackage provides easier access to sample datasets with inbuilt downloading. #690
Breaking changes
- The stellargraph library now only supports
tensorflow
version 2.0 #518, #732. Backward compatibility with earlier versions oftensorflow
is not guaranteed. - The stellargraph library now only supports Python versions 3.6 and above #641. Backward compatibility with earlier versions of Python is not guaranteed.
- The
StellarGraph
class no longer exposesNetworkX
internals, only required functionality. In particular, calls likelist(G)
will no longer return a list of nodes; useG.nodes()
instead. #297 If NetworkX functionality is required, use the new.to_networkx()
method to convert to a normalnetworkx.MultiGraph
ornetworkx.MultiDiGraph
. - Passing a
NodeSequence
orLinkSequence
object toGraphSAGE
andHinSAGE
classes is now deprecated and no longer supported #498. Users might need to update their calls ofGraphSAGE
andHinSAGE
classes by passinggenerator
objects instead ofgenerator.flow()
objects. - Various methods on
StellarGraph
have been renamed to be more succinct and uniform:get_feature_for_nodes
is nownode_features
type_for_node
is nownode_type
- Neighbourhood methods in
StellarGraph
class (neighbors
,in_nodes
,out_nodes
) now return a list of neighbours instead of a set. This addresses #653. This means multi-edges are no longer collapsed into one in the return value. There will be an implicit change in behaviour for explorer classes used for algorithms like GraphSAGE, Node2Vec, since a neighbour connected via multiple edges will now be more likely to be sampled. If this doesn't sound like the desired behaviour, consider pruning the graph of multi-edges before running the algorithm. GraphSchema
has been simplified to remove type look-ups for individual nodes and edges #702 #703. Migration: for nodes, useStellarGraph.node_type
; for edges, use thetriple
argument to theedges
method, or filter when doing neighbour queries using theedge_types
argument.NodeAttributeSpecification
and the supportingConverter
classes have been removed #707. Migration: use the more powerful and flexible preprocessing tools from pandas and sklearn (see the linked PR for specifics)
Experimental features
Some new algorithms and features are still under active development, and are available as an experimental preview. However, they may not be easy to use: their documentation or testing may be incomplete, and they may change dramatically from release to release. The experimental status is noted in the documentation and at runtime via prominent warnings.
- The
StellarGraph
andStellarDiGraph
classes supports using a backend based on NumPy and Pandas that uses dramatically less memory for large graphs than the existing NetworkX-based backend #668. The new backend can be enabled by constructing withStellarGraph(nodes=..., edges=...)
using Pandas DataFrames, instead of a NetworkX graph.
Bug fixes and other changes
- Documentation for every relased version is published under a permanent URL, in addition to the
stable
alias for the latest release, e.g. https://stellargraph.readthedocs.io/en/v0.8.4/ forv0.8.4
#612 - Neighbourhood methods in
StellarGraph
class (neighbors
,in_nodes
,out_nodes
) now support additional parameters to include edge weights in the results or filter by a set of edge types. #646 - Changed
GraphSAGE
andHinSAGE
class API to accept generator objects the same as GCN/GAT models. Passing aNodeSequence
orLinkSequence
object is now deprecated. #498 SampledBreadthFirstWalk
,SampledHeterogeneousBreadthFirstWalk
andDirectedBreadthFirstNeighbours
have been made 1.2-1.5× faster #628UniformRandomWalk
has been made 2× faster #625FullBatchNodeGenerator.flow
has been reduced fromO(n^2)
quadratic complexity toO(n)
, wheren
is the number of nodes in the graph, making it orders of magnitude faster for large graphs #513- The dependencies required for demos and testing have been included as "extras" in the main package:
demos
andigraph
for demos, andtest
for testing. For example,pip install stellargraph[demos,igraph]
will install the dependencies required to run every demo. #661 - The
StellarGraph
andStellarDiGraph
constructors now list their arguments explicitly for clearer documentation (rather than using*arg
and**kwargs
splats) #659 sys.exit(0)
is no longer called on failure inload_dataset_BlogCatalog3
#648- Warnings are printed using the Python
warnings
module #583 - Numerous DevOps changes:
- CI results are now publicly viewable: https://buildkite.com/stellar/stellargraph-public
- CI: #524, #534, #544, #550, #551, #557, #562, #574 #578, #579, #587, #592, #595, #596, #602, #609, #613, #615, #631, #637, #639, #640, #652, #656, #663, #675
- Git and Github configuration: #516, #588, #624, #672, #682, #683,
- Other: #523, #582, #590, #654
Release 0.8.4 (hotfix)
Fixed bugs:
- Fix
DirectedGraphSAGENodeGenerator
always hittingTypeError
exception. #695