Wednesday, 15 May 2013

regex - Count number of words in the open ended responses in R -



regex - Count number of words in the open ended responses in R -

i have dataset 600 responses "free_text" variable contains feedback/comments respondents. want calculate number of words in comments each respondent. how should it? new learner of r , working on r studio.

consider using stri_extract_words stringi package, if have non-english text. uses icu's breakiterator task , contains list of sophisticated word breaking rules.

library(stringi) str <- c("how many words there?", "r — язык программирования для статистической обработки данных и работы с графикой, а также свободная программная среда вычислений с открытым исходным кодом в рамках проекта gnu.") stri_extract_words(str) ## [[1]] ## [1] "how" "many" "words" "are" "there" ## ## [[2]] ## [1] "r" "язык" "программирования" "для" "статистической" ## [6] "обработки" "данных" "и" "работы" "с" ## [11] "графикой" "а" "также" "свободная" "программная" ## [16] "среда" "вычислений" "с" "открытым" "исходным" ## [21] "кодом" "в" "рамках" "проекта" "gnu" sapply(stri_extract_words(str), length) # how many words there in each character string? ## [1] 5 25

regex r pattern-matching

No comments:

Post a Comment