Skip to content

Command Line

Netts takes speech transcripts and converts them into a semantic graph. Imagine we have the following short transcript in a file called transcript.txt:

I see a man and he is wearing a jacket. He is standing in the dark against a light post. On the picture there seems to be like a park and... Or trees but in those trees there are little balls of light reflections as well. I cannot see the... Anything else because it’s very dark. But the man on the picture seems to wear a hat and he seems to have a hoodie on as well. The picture is very mysterious, which I like about it, but for me I would like to understand more about the picture.

Follow along - Example transcript To follow along create this example in a file by running the following command in a terminal
echo "I see a man and he is wearing a jacket. He is standing in the dark against a light post. On the picture there seems to be like a park and... Or trees but in those trees there are little balls of light reflections as well. I cannot see the... Anything else because it’s very dark. But the man on the picture seems to wear a hat and he seems to have a hoodie on as well. The picture is very mysterious, which I like about it, but for me I would like to understand more about the picture." > transcript.txt

Create a semantic graph

We can create a semantic graph from the transcript using either the command line interface (CLI) of python package. We can process a single transcript with the CLI like this

netts run transcript.txt outputs

We can break this down into the following components:

CLI Command transcript.txt outputs
netts run Path to transcript path of output directory
  1. transcript.txt can be replaced with the full path to any .txt file.
  2. outputs can be replaced with the path to any directory. If the directory does not exist yet netts will create it.

Netts uses Openie5 and CoreNLP under the hood. These are both Java programmes that we installed in the previous step. We use a context manager to start the servers, which makes sure they are both automatically shut down when processing finishes.

Warning

The servers are extremely memory hungry, using ~8GB. If the server fails to start you probably ran out of memory and failed silently. Try on a machine with more memory.

Outputs

Once netts processes the transcript the output directory will contain two files:

outputs/
    transcript.pickle
    transcript.png

The file prefix is taken from the input file (in this case transcript.txt)

transcript.pickle: A NetworkX MiltiDiGraph object.

transcript.png: A plot of the graph.

Process a directory of transcripts

If you have a folder of transcripts you can process the entire folder with the CLI. For example, if you have a folder called all_transcripts:

all_transcripts/
    transcript_1.txt
    transcript_2.txt
    ...

you can process it with

netts run all_transcripts outputs