Skip to contents

Implementation of CLAN's VOCD command

Usage

VOCD(tokens, rep = 100, rng = 35:50, as_CLAN = T)

Arguments

tokens

Character. A vector of tokens

rep

Integer. The number of resamplings used for calculating the mean Type-Token ratio (TTR) at a particular sampled token size (range from 35 to 50 tokens).

rng

Integer vector. A sequence of sample sizes used for calculating the D measure. By default, 35:50, which is the values used in the VOCD program.

as_CLAN

Logical. See details. By default, TRUE, which corresponds to the default behavior in CLAN.

Details

This function follows the algorithm described in the CLAN manual (2024) and Durán et al. (2004). The equation below relates the VOCD measure (i.e., D) to TTR.

$$TTR = \frac{D}{N} \left[ \left( 1 + 2 \frac{N}{D} \right)^{\frac{1}{2}} - 1 \right]$$

Note that in CLAN, VOCD is called thrice. The average of the three D measures is then used as the final D measure. This is done when setting the argument as_CLAN = TRUE in VOCD(). Otherwise, the D measure will only be calculated once.

References

MacWhinney, B. (2000). The CHILDES Project: Tools for Analyzing Talk. 3rd Edition. MacWhinney, Brian. “CLAN Manual.” TalkBank, 2024. https://doi.org/10.21415/T5G10R. (p.111-114) Durán, Pilar, David Malvern, Brian Richards, and Ngoni Chipere. “Developmental Trends in Lexical Diversity.” Applied Linguistics 25, no. 2 (June 1, 2004): 220–42. https://doi.org/10.1093/applin/25.2.220.