Netts package
If you dont want to use the netts command line interface (CLI) or want more control over netts you can use the netts python package directly. Here we'll run through the CLI example in Python.
Make sure you have a transcript in your working directory.
Follow along - Example transcript
To follow along create this example in a file by running the following command in a terminalecho "I see a man and he is wearing a jacket. He is standing in the dark against a light post. On the picture there seems to be like a park and... Or trees but in those trees there are little balls of light reflections as well. I cannot see the... Anything else because it’s very dark. But the man on the picture seems to wear a hat and he seems to have a hoodie on as well. The picture is very mysterious, which I like about it, but for me I would like to understand more about the picture." > transcript.txt
Process a single transcript
We can then process a single transcript using the following Python script, which we'll run through step by step.
import matplotlib.pyplot as plt
import netts
settings = netts.get_settings()
print(f"Installing dependencies to {settings.netts_dir}")
netts.install_dependencies()
with netts.OpenIEClient() as openie_client, netts.CoreNLPClient(
properties={"annotators": "tokenize,ssplit,pos,lemma,parse,depparse,coref,openie"},
) as corenlp_client:
with open("transcript.txt", encoding="utf-8") as f:
transcript = f.read()
graph = netts.SpeechGraph(transcript)
graph.process(
openie_client=openie_client,
corenlp_client=corenlp_client,
preprocess_config=settings.netts_config.preprocess,
)
fig, ax = plt.subplots()
graph.plot_graph(ax)
plt.savefig("transcript.png")
with open("transcript.pkl", "wb") as output_f:
netts.pickle_graph(graph.graph, output_f)
First we load a Settings
object which provides information about the netts configuration. We then check where netts will install addition dependencies, and finally download them. This can take a long time (~20min), so time to put the kettle on.
Info
If the dependencies have already been installed this function will do nothing.
Start the CoreNLP and OpenIE5 servers
import matplotlib.pyplot as plt
import netts
settings = netts.get_settings()
print(f"Installing dependencies to {settings.netts_dir}")
netts.install_dependencies()
with netts.OpenIEClient() as openie_client, netts.CoreNLPClient(
properties={"annotators": "tokenize,ssplit,pos,lemma,parse,depparse,coref,openie"},
) as corenlp_client:
with open("transcript.txt", encoding="utf-8") as f:
transcript = f.read()
graph = netts.SpeechGraph(transcript)
graph.process(
openie_client=openie_client,
corenlp_client=corenlp_client,
preprocess_config=settings.netts_config.preprocess,
)
fig, ax = plt.subplots()
graph.plot_graph(ax)
plt.savefig("transcript.png")
with open("transcript.pkl", "wb") as output_f:
netts.pickle_graph(graph.graph, output_f)
Netts uses Openie5 and CoreNLP under the hood. These are both Java programmes that we installed in the previous step. We use a context manager to start the servers, which makes sure they are both automatically shut down when processing finishes.
Warning
The servers are extremely memory hungry, using ~8GB. If the server fails to start you probably ran out of memory and failed silently. Try on a machine with more memory.
Process a transcript
import matplotlib.pyplot as plt
import netts
settings = netts.get_settings()
print(f"Installing dependencies to {settings.netts_dir}")
netts.install_dependencies()
with netts.OpenIEClient() as openie_client, netts.CoreNLPClient(
properties={"annotators": "tokenize,ssplit,pos,lemma,parse,depparse,coref,openie"},
) as corenlp_client:
with open("transcript.txt", encoding="utf-8") as f:
transcript = f.read()
graph = netts.SpeechGraph(transcript)
graph.process(
openie_client=openie_client,
corenlp_client=corenlp_client,
preprocess_config=settings.netts_config.preprocess,
)
fig, ax = plt.subplots()
graph.plot_graph(ax)
plt.savefig("transcript.png")
with open("transcript.pkl", "wb") as output_f:
netts.pickle_graph(graph.graph, output_f)
Next we load our transcript into memory, create a SpeechGraph
object and then call its process
method, passing our two servers and a configuration object.
Here we use a default configuration object settings.netts_config.preprocess
.
Plot graph and save outputs
import matplotlib.pyplot as plt
import netts
settings = netts.get_settings()
print(f"Installing dependencies to {settings.netts_dir}")
netts.install_dependencies()
with netts.OpenIEClient() as openie_client, netts.CoreNLPClient(
properties={"annotators": "tokenize,ssplit,pos,lemma,parse,depparse,coref,openie"},
) as corenlp_client:
with open("transcript.txt", encoding="utf-8") as f:
transcript = f.read()
graph = netts.SpeechGraph(transcript)
graph.process(
openie_client=openie_client,
corenlp_client=corenlp_client,
preprocess_config=settings.netts_config.preprocess,
)
fig, ax = plt.subplots()
graph.plot_graph(ax)
plt.savefig("transcript.png")
with open("transcript.pkl", "wb") as output_f:
netts.pickle_graph(graph.graph, output_f)
Finally we plot our graph, save it to file and also pickle our graph object for further analysis later.
Refining the plot
The network is plotted using spring-embedding, which tries to plot the network such that you get the least overlapping of nodes and edges with each other. This also means that each time you plot the network, it will look slightly different. If you are not happy with the way your network is plotted, try re-running the last few lines of code and look at the transcript.png image file again:
fig, ax = plt.subplots()
graph.plot_graph(ax)
plt.savefig("transcript.png")
You can also open the figure in an interactive window by running:
fig, ax = plt.subplots()
graph.plot_graph(ax)
plt.show()
When resizing the window, we see that the network automatically adjusts to the new window size and the nodes and edges spread out. This usually helps visualisation a lot. We therefore recommend trying out this step when processing your first few transcripts and inspecting the networks.