Implementation of CLAN's VOCD command
Arguments
- tokens
Character. A vector of tokens
- rep
Integer. The number of resamplings used for calculating the mean Type-Token ratio (TTR) at a particular sampled token size (range from 35 to 50 tokens).
- rng
Integer vector. A sequence of sample sizes used for calculating the D measure. By default, 35:50, which is the values used in the VOCD program.
- as_CLAN
Logical. See details. By default,
TRUE
, which corresponds to the default behavior in CLAN.
Details
This function follows the algorithm described in the CLAN manual (2024) and Durán et al. (2004). The equation below relates the VOCD measure (i.e., D) to TTR.
$$TTR = \frac{D}{N} \left[ \left( 1 + 2 \frac{N}{D} \right)^{\frac{1}{2}} - 1 \right]$$
Note that in CLAN, VOCD is called thrice.
The average of the three D measures is then used as the final
D measure. This is done when setting the argument as_CLAN = TRUE
in VOCD()
. Otherwise, the D measure will only be calculated once.
References
MacWhinney, B. (2000). The CHILDES Project: Tools for Analyzing Talk. 3rd Edition. MacWhinney, Brian. “CLAN Manual.” TalkBank, 2024. https://doi.org/10.21415/T5G10R. (p.111-114) Durán, Pilar, David Malvern, Brian Richards, and Ngoni Chipere. “Developmental Trends in Lexical Diversity.” Applied Linguistics 25, no. 2 (June 1, 2004): 220–42. https://doi.org/10.1093/applin/25.2.220.