The data structure includes the metadata of the talk with a nested dataframe with the paragraphs of the text.
library(generalconference)
#> Loading required package: dplyr
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
#> Loading required package: furrr
#> Loading required package: future
#> Loading required package: glue
#>
#> Attaching package: 'glue'
#> The following object is masked from 'package:dplyr':
#>
#> collapse
#> Loading required package: purrr
#> Loading required package: stringr
#> Loading required package: readr
#> Loading required package: rvest
#>
#> Attaching package: 'rvest'
#> The following object is masked from 'package:readr':
#>
#> guess_encoding
#> Loading required package: tictoc
#> Loading required package: tidyr
#> Loading required package: xml2
url <- "https://www.churchofjesuschrist.org/study/general-conference/2021/04/49nelson?lang=eng"
scrape_talk(url)
#> # A tibble: 1 × 6
#> url title1 author1 author2 kicker1 paragraphs
#> <chr> <chr> <chr> <chr> <chr> <list>
#> 1 https://www.c… Christ Is … By Presid… President of… Faith in Jesus… <tibble […
Simply unnest the paragraphs column to get all paragraphs for a talk:
scrape_talk(url) %>%
tidyr::unnest(paragraphs)
#> # A tibble: 33 × 10
#> url title1 author1 author2 kicker1 section_num p_num p_id is_header
#> <chr> <chr> <chr> <chr> <chr> <int> <int> <chr> <lgl>
#> 1 https:… Christ … By Pres… Preside… Faith i… 0 1 p1 FALSE
#> 2 https:… Christ … By Pres… Preside… Faith i… 0 2 p3 FALSE
#> 3 https:… Christ … By Pres… Preside… Faith i… 0 3 p4 FALSE
#> 4 https:… Christ … By Pres… Preside… Faith i… 0 4 p5 FALSE
#> 5 https:… Christ … By Pres… Preside… Faith i… 0 5 p6 FALSE
#> 6 https:… Christ … By Pres… Preside… Faith i… 0 6 p9 FALSE
#> 7 https:… Christ … By Pres… Preside… Faith i… 0 7 p10 FALSE
#> 8 https:… Christ … By Pres… Preside… Faith i… 0 8 p11 FALSE
#> 9 https:… Christ … By Pres… Preside… Faith i… 0 9 p12 FALSE
#> 10 https:… Christ … By Pres… Preside… Faith i… 0 10 p13 FALSE
#> # … with 23 more rows, and 1 more variable: paragraph <chr>
Pull all conference URLs for April, 1971:
df_conference <- scrape_conference_urls(1971, 4)
df_conference
#> # A tibble: 1 × 3
#> year month sessions
#> <dbl> <dbl> <list>
#> 1 1971 4 <tibble [7 × 4]>
Unnest to see sessions
df_conference %>%
unnest(sessions)
#> # A tibble: 7 × 6
#> year month session_name session_id session_url session_talk_ur…
#> <dbl> <dbl> <chr> <int> <chr> <list>
#> 1 1971 4 Saturday Morning Session 1 /study/gen… <tibble [4 × 2]>
#> 2 1971 4 Saturday Afternoon Session 2 /study/gen… <tibble [7 × 2]>
#> 3 1971 4 Priesthood Session 3 /study/gen… <tibble [7 × 2]>
#> 4 1971 4 Sunday Morning Session 4 /study/gen… <tibble [5 × 2]>
#> 5 1971 4 Sunday Afternoon Session 5 /study/gen… <tibble [7 × 2]>
#> 6 1971 4 Tuesday Morning Session 6 /study/gen… <tibble [5 × 2]>
#> 7 1971 4 Tuesday Afternoon Session 7 /study/gen… <tibble [7 × 2]>
Unnest to see the individual talk urls:
df_conference %>%
unnest(sessions) %>%
unnest(session_talk_urls)
#> # A tibble: 42 × 7
#> year month session_name session_id session_url talk_urls talk_session_id
#> <dbl> <dbl> <chr> <int> <chr> <chr> <int>
#> 1 1971 4 Saturday Mor… 1 /study/gene… /study/gen… 1
#> 2 1971 4 Saturday Mor… 1 /study/gene… /study/gen… 2
#> 3 1971 4 Saturday Mor… 1 /study/gene… /study/gen… 3
#> 4 1971 4 Saturday Mor… 1 /study/gene… /study/gen… 4
#> 5 1971 4 Saturday Aft… 2 /study/gene… /study/gen… 1
#> 6 1971 4 Saturday Aft… 2 /study/gene… /study/gen… 2
#> 7 1971 4 Saturday Aft… 2 /study/gene… /study/gen… 3
#> 8 1971 4 Saturday Aft… 2 /study/gene… /study/gen… 4
#> 9 1971 4 Saturday Aft… 2 /study/gene… /study/gen… 5
#> 10 1971 4 Saturday Aft… 2 /study/gene… /study/gen… 6
#> # … with 32 more rows