This package contains the code and notebooks developed on the EXPERT project to fuse a variety of multilingual heterogenous open-source data streams


This package contains the code and notebooks developed on the EXPERT project to fuse a variety of multilingual heterogenous open-source data streams, e.g., publications, institutional web pages, conference pages, and researcher profiles, to convert unstructured data into knowledge summaries and construct dynamically evolving proliferation expertise graphs for descriptive, predictive, and prescriptive analytics.

Install and Use

This package can be installed locally as a static install using:

python install

or alternatively as a development install (incorporating updates when changes are made to files) using:

python develop

If installed as a static install, the package must be reinstalled to incorporate updates.

If installed as a development install, changes to the files will be updated dynamically in the package without running subsequent installation commands.

External Install Depencies


To use the SPERT model for the local content graph:

  1. Clone the SPERT repo:
  2. Download the data and models:
bash ./scripts/
bash ./scripts/
  1. Point the local graph construction to the location of the SPERT repo:
graph = lcg.localContentGraph(text=text,

PDF parsing

Install GROBID using instructions here:


java -jar grobid-core/build/libs/grobid-core-0.6.2-onejar.jar -gH grobid-home -dIn /path/to/pdf/files/ -dOut /path/to/output/directory/  -exe processFullText -ignoreAssets

Examples for parsing the resulting XML files can be found in the GROBID Parsing Example notebook.

Documentation of Repository


- Directory containing example notebooks, README in this directory summarizes examples included and functionality of each notebook.


- Directory containing the package classes, scripts, and other code-related files for the **expert** package

- code to extract ESTEEM embeddings



- helper functions for deduplication of entity dataframes


- class to merge entities based on text and graph similarity


- wrapper class for interactive merging using the IdentityMatcher widget and entityMerger objects, maintaining history and provenance of merge choices.

- code to generate global content graphs from scientific publication data

- code to resolve multiple representations into single nodes in graphs

- code to generate a local content graph from scientific publication data



- code to generate a context graph from publication metadata


- code to format and load Scopus, Web of Science, OSTI, Arxiv, Biorxiv, and DBLP publication metadata

- nuclear related keywords and terms compiled using IAEA glossary and SME knowledge

- code to generate Semantic Role Label (SRL)-based content graphs


- mappings and scripts to update mappings of abbreviations used in arxiv, MSC, and PACS taxonomies

- topic modelling, including visualizations of related topics using wordcloud-nodes in a graph visualization

- visualization functions, e.g. graphviz-based graph visualizations

Graph Benchmarks Graph datasets can be accessed from the Berkeley Data Cloud (BDC): An account is required for access: The BDC is a data sharing and management platform for US Government-supported nuclear nonproliferation research projects.

This material was prepared as an account of work sponsored by an agency of the United States Government. Neither the United States Government nor the United States Department of Energy, nor Battelle, nor any of their employees, nor any jurisdiction or organization that has cooperated in the development of these materials, makes any warranty, express or implied, or assumes any legal liability or responsibility for the accuracy, completeness, or usefulness or any information, apparatus, product, software, or process disclosed, or represents that its use would not infringe privately owned rights. Reference herein to any specific commercial product, process, or service by trade name, trademark, manufacturer, or otherwise does not necessarily constitute or imply its endorsement, recommendation, or favoring by the United States Government or any agency thereof, or Battelle Memorial Institute. The views and opinions of authors expressed herein do not necessarily state or reflect those of the United States Government or any agency thereof.

operated by
for the
under Contract DE-AC05-76RL01830

Download Source Code

Download ZIP
Mar 28, 2022