Netts package

If you dont want to use the netts command line interface (CLI) or want more control over netts you can use the netts python package directly. Here we'll run through the CLI example in Python.

Make sure you have a transcript in your working directory.

Follow along - Example transcript

To follow along create this example in a file by running the following command in a terminal

echo "I see a man and he is wearing a jacket. He is standing in the dark against a light post. On the picture there seems to be like a park and... Or trees but in those trees there are little balls of light reflections as well. I cannot see the... Anything else because it’s very dark. But the man on the picture seems to wear a hat and he seems to have a hoodie on as well. The picture is very mysterious, which I like about it, but for me I would like to understand more about the picture." > transcript.txt

Process a single transcript

We can then process a single transcript using the following Python script, which we'll run through step by step.

import matplotlib.pyplot as plt

import netts

settings = netts.get_settings()
print(f"Installing dependencies to {settings.netts_dir}")
netts.install_dependencies()

with netts.OpenIEClient() as openie_client, netts.CoreNLPClient(
    properties={"annotators": "tokenize,ssplit,pos,lemma,parse,depparse,coref,openie"},
) as corenlp_client:

    with open("transcript.txt", encoding="utf-8") as f:
        transcript = f.read()

    graph = netts.SpeechGraph(transcript)

    graph.process(
        openie_client=openie_client,
        corenlp_client=corenlp_client,
        preprocess_config=settings.netts_config.preprocess,
    )

fig, ax = plt.subplots()
graph.plot_graph(ax)

plt.savefig("transcript.png")

with open("transcript.pkl", "wb") as output_f:
    netts.pickle_graph(graph.graph, output_f)

First we load a Settings object which provides information about the netts configuration. We then check where netts will install addition dependencies, and finally download them. This can take a long time (~20min), so time to put the kettle on.

Info

If the dependencies have already been installed this function will do nothing.

Start the CoreNLP and OpenIE5 servers

import matplotlib.pyplot as plt

import netts

settings = netts.get_settings()
print(f"Installing dependencies to {settings.netts_dir}")
netts.install_dependencies()

with netts.OpenIEClient() as openie_client, netts.CoreNLPClient(
    properties={"annotators": "tokenize,ssplit,pos,lemma,parse,depparse,coref,openie"},
) as corenlp_client:

    with open("transcript.txt", encoding="utf-8") as f:
        transcript = f.read()

    graph = netts.SpeechGraph(transcript)

    graph.process(
        openie_client=openie_client,
        corenlp_client=corenlp_client,
        preprocess_config=settings.netts_config.preprocess,
    )

fig, ax = plt.subplots()
graph.plot_graph(ax)

plt.savefig("transcript.png")

with open("transcript.pkl", "wb") as output_f:
    netts.pickle_graph(graph.graph, output_f)

Netts uses Openie5 and CoreNLP under the hood. These are both Java programmes that we installed in the previous step. We use a context manager to start the servers, which makes sure they are both automatically shut down when processing finishes.

Warning

The servers are extremely memory hungry, using ~8GB. If the server fails to start you probably ran out of memory and failed silently. Try on a machine with more memory.

Process a transcript

import matplotlib.pyplot as plt

import netts

settings = netts.get_settings()
print(f"Installing dependencies to {settings.netts_dir}")
netts.install_dependencies()

with netts.OpenIEClient() as openie_client, netts.CoreNLPClient(
    properties={"annotators": "tokenize,ssplit,pos,lemma,parse,depparse,coref,openie"},
) as corenlp_client:

    with open("transcript.txt", encoding="utf-8") as f:
        transcript = f.read()

    graph = netts.SpeechGraph(transcript)

    graph.process(
        openie_client=openie_client,
        corenlp_client=corenlp_client,
        preprocess_config=settings.netts_config.preprocess,
    )

fig, ax = plt.subplots()
graph.plot_graph(ax)

plt.savefig("transcript.png")

with open("transcript.pkl", "wb") as output_f:
    netts.pickle_graph(graph.graph, output_f)

Next we load our transcript into memory, create a SpeechGraph object and then call its process method, passing our two servers and a configuration object.

Here we use a default configuration object settings.netts_config.preprocess.

Plot graph and save outputs

import matplotlib.pyplot as plt

import netts

settings = netts.get_settings()
print(f"Installing dependencies to {settings.netts_dir}")
netts.install_dependencies()

with netts.OpenIEClient() as openie_client, netts.CoreNLPClient(
    properties={"annotators": "tokenize,ssplit,pos,lemma,parse,depparse,coref,openie"},
) as corenlp_client:

    with open("transcript.txt", encoding="utf-8") as f:
        transcript = f.read()

    graph = netts.SpeechGraph(transcript)

    graph.process(
        openie_client=openie_client,
        corenlp_client=corenlp_client,
        preprocess_config=settings.netts_config.preprocess,
    )

fig, ax = plt.subplots()
graph.plot_graph(ax)

plt.savefig("transcript.png")

with open("transcript.pkl", "wb") as output_f:
    netts.pickle_graph(graph.graph, output_f)

Finally we plot our graph, save it to file and also pickle our graph object for further analysis later.

Refining the plot

The network is plotted using spring-embedding, which tries to plot the network such that you get the least overlapping of nodes and edges with each other. This also means that each time you plot the network, it will look slightly different. If you are not happy with the way your network is plotted, try re-running the last few lines of code and look at the transcript.png image file again:

fig, ax = plt.subplots()
graph.plot_graph(ax)

plt.savefig("transcript.png")

You can also open the figure in an interactive window by running:

fig, ax = plt.subplots()
graph.plot_graph(ax)

plt.show()

When resizing the window, we see that the network automatically adjusts to the new window size and the nodes and edges spread out. This usually helps visualisation a lot. We therefore recommend trying out this step when processing your first few transcripts and inspecting the networks.