Conference abstract bi-grams – FOSS4GUK

I helped run a conference last week. As part of this I produced a wordcloud from the conference abstracts, although pretty it could have been more informative of the conference content. This blog post shows you how to make a network of conference bi-grams.

abstract_bigram
FOSS4GUK 2019 abstract bigrams

A bi-gram is a pair of words. In the last sentence “is a”, “a pair” and “pair of” are all bi-grams, they are pairs of words which are adjacent. I based this blog post on Julia and David’s excellent tidytext book. As before each abstract is stored in a separate file, so I’ve read each of those in and then turned them into a tidy bi-gram table:

library(tidyverse)
library(tidytext)
library(tidygraph)
library(ggraph)
library(extrafont)

# ----------------------------

data("stop_words")

f = list.files("~/Cloud/Michael/FOSS4G/talks/abstracts_clean/")
abstracts = lapply(f, function(i){
   read_table(paste0("~/Cloud/Michael/FOSS4G/talks/abstracts_clean/", i),
              col_names = F) %>%
      gather(key, word) %>%
      select(-key) %>%
      add_column(author = str_remove(i, ".txt")) %>%
      unnest_tokens(bigram, word, token = "ngrams", n = 2)
})
abstracts = do.call("rbind.data.frame", abstracts)

bigrams = abstracts %>%
   separate(bigram, c("word1", "word2"), sep = " ") %>%
   filter(!word1 %in% stop_words$word[stop_words$word != "open"]) %>%
   filter(!word2 %in% stop_words$word[stop_words$word != "open"]) %>%
   filter(!str_detect(word1, "[0-9]")) %>%
   filter(!str_detect(word2, "[0-9]")) %>%
   filter(!str_detect(word1, "NA")) %>%
   filter(!str_detect(word2, "NA"))

bigram_counts = bigrams %>%
   count(word1, word2, sort = TRUE)

Then I write out a graph to a png. There’s some nifty stuff on the repel line which keeps labels on the plot and I’ve event put the text into the conference font:

png("~/Cloud/Michael/FOSS4G/talks/abstract_bigram.png",
    width=1200, height=850, res=110)
bigram_counts %>%
   filter(n > 1) %>%
   as_tbl_graph() %>%
   ggraph(layout = "fr") +
   geom_edge_link(width = 1.1, colour = "#f49835") +
   geom_node_point(colour = "#497fbf") +
   geom_node_text(aes(label = name),
                  colour = "grey10",
                  vjust = 1, hjust = 1,
                  repel = T, force = 0.1, box.padding = 0)+
   labs(title = "FOSS4GUK 2019 - Edinburgh",
        subtitle = "Abstract bigrams") +
   theme(text = element_text(family = "Aileron SemiBold",
                             colour = "grey10"))
dev.off()

1 Comment

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s