The data structure includes the metadata of the talk with a nested dataframe with the paragraphs of the text.

Scrape a given talk

library(generalconference)
#> Loading required package: dplyr
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
#> Loading required package: furrr
#> Loading required package: future
#> Loading required package: glue
#> 
#> Attaching package: 'glue'
#> The following object is masked from 'package:dplyr':
#> 
#>     collapse
#> Loading required package: purrr
#> Loading required package: stringr
#> Loading required package: readr
#> Loading required package: rvest
#> 
#> Attaching package: 'rvest'
#> The following object is masked from 'package:readr':
#> 
#>     guess_encoding
#> Loading required package: tictoc
#> Loading required package: tidyr
#> Loading required package: xml2

url <- "https://www.churchofjesuschrist.org/study/general-conference/2021/04/49nelson?lang=eng"
scrape_talk(url)
#> # A tibble: 1 × 6
#>   url            title1      author1    author2       kicker1         paragraphs
#>   <chr>          <chr>       <chr>      <chr>         <chr>           <list>    
#> 1 https://www.c… Christ Is … By Presid… President of… Faith in Jesus… <tibble […

Simply unnest the paragraphs column to get all paragraphs for a talk:

scrape_talk(url) %>%
  tidyr::unnest(paragraphs)
#> # A tibble: 33 × 10
#>    url     title1   author1  author2  kicker1  section_num p_num p_id  is_header
#>    <chr>   <chr>    <chr>    <chr>    <chr>          <int> <int> <chr> <lgl>    
#>  1 https:… Christ … By Pres… Preside… Faith i…           0     1 p1    FALSE    
#>  2 https:… Christ … By Pres… Preside… Faith i…           0     2 p3    FALSE    
#>  3 https:… Christ … By Pres… Preside… Faith i…           0     3 p4    FALSE    
#>  4 https:… Christ … By Pres… Preside… Faith i…           0     4 p5    FALSE    
#>  5 https:… Christ … By Pres… Preside… Faith i…           0     5 p6    FALSE    
#>  6 https:… Christ … By Pres… Preside… Faith i…           0     6 p9    FALSE    
#>  7 https:… Christ … By Pres… Preside… Faith i…           0     7 p10   FALSE    
#>  8 https:… Christ … By Pres… Preside… Faith i…           0     8 p11   FALSE    
#>  9 https:… Christ … By Pres… Preside… Faith i…           0     9 p12   FALSE    
#> 10 https:… Christ … By Pres… Preside… Faith i…           0    10 p13   FALSE    
#> # … with 23 more rows, and 1 more variable: paragraph <chr>

Conference URLs

Pull all conference URLs for April, 1971:

df_conference <- scrape_conference_urls(1971, 4)
df_conference
#> # A tibble: 1 × 3
#>    year month sessions        
#>   <dbl> <dbl> <list>          
#> 1  1971     4 <tibble [7 × 4]>

Unnest to see sessions

df_conference %>%
  unnest(sessions)
#> # A tibble: 7 × 6
#>    year month session_name               session_id session_url session_talk_ur…
#>   <dbl> <dbl> <chr>                           <int> <chr>       <list>          
#> 1  1971     4 Saturday Morning Session            1 /study/gen… <tibble [4 × 2]>
#> 2  1971     4 Saturday Afternoon Session          2 /study/gen… <tibble [7 × 2]>
#> 3  1971     4 Priesthood Session                  3 /study/gen… <tibble [7 × 2]>
#> 4  1971     4 Sunday Morning Session              4 /study/gen… <tibble [5 × 2]>
#> 5  1971     4 Sunday Afternoon Session            5 /study/gen… <tibble [7 × 2]>
#> 6  1971     4 Tuesday Morning Session             6 /study/gen… <tibble [5 × 2]>
#> 7  1971     4 Tuesday Afternoon Session           7 /study/gen… <tibble [7 × 2]>

Unnest to see the individual talk urls:

df_conference %>%
  unnest(sessions) %>%
  unnest(session_talk_urls)
#> # A tibble: 42 × 7
#>     year month session_name  session_id session_url  talk_urls   talk_session_id
#>    <dbl> <dbl> <chr>              <int> <chr>        <chr>                 <int>
#>  1  1971     4 Saturday Mor…          1 /study/gene… /study/gen…               1
#>  2  1971     4 Saturday Mor…          1 /study/gene… /study/gen…               2
#>  3  1971     4 Saturday Mor…          1 /study/gene… /study/gen…               3
#>  4  1971     4 Saturday Mor…          1 /study/gene… /study/gen…               4
#>  5  1971     4 Saturday Aft…          2 /study/gene… /study/gen…               1
#>  6  1971     4 Saturday Aft…          2 /study/gene… /study/gen…               2
#>  7  1971     4 Saturday Aft…          2 /study/gene… /study/gen…               3
#>  8  1971     4 Saturday Aft…          2 /study/gene… /study/gen…               4
#>  9  1971     4 Saturday Aft…          2 /study/gene… /study/gen…               5
#> 10  1971     4 Saturday Aft…          2 /study/gene… /study/gen…               6
#> # … with 32 more rows