Dichotomize. — dichotomize • netCoin

This converts factor(s) o character(s) column(s) of a data frame into a set of dichotomous columns. Their names will correspond to the labels or text of every category.

dichotomize(data, variables,
            sep = "", min = 1, length = 0, values = NULL,
            sparse = FALSE, add = TRUE, sort = TRUE, nas = "None")

Arguments

data	a data frame with a factor or textual column which can be simple (only one value for each scenario) or multiple if components are delinited with a separator.
variables	vector of column names that have to be converted into dichotomous vectors.
sep	vector of characters used to divide columns with multiple events. If this separator is "", every unique cell of every column is converted into a dichotomus data frame's column.
min	convert to dichotomous vectors only label or text that has a frequency less or equal to the value of this parameter. If the value of min is between 0 and 1, its value is interpreted as a percentage
length	maximum number of dichotomous columns generated for every variable
values	vector of labels or texts selected to their conversion to dichotomous columns
sparse	produce a sparse matrix instead of a data.frame
add	add the new columns to the input data.frame
sort	order the new columns by their frequencies
nas	variable name to convert the NA values of the set of variables

Value

A data frame composed by the original plus the added dichotmous columns.

References

Escobar, M. and Martinez-Uribe, L. (2020) Network Coincidence Analysis: The netCoin R Package. Journal of Statistical Software, 93, 1-32. doi: 10.18637/jss.v093.i11 .

Author

Modesto Escobar, Department of Sociology and Communication, University of Salamanca, and Luis Martinez Uribe, Fundacion Juan March. See https://sociocav.usal.es/blog/modesto-escobar/

Examples

# A character column
frame1 <- data.frame(A = c("Man", "Women", "Man", "Undet."))
dichotomize(frame1, "A", sep = "; ")
#>         A Man Undet. Women A:None
#> V1    Man   1      0     0      0
#> V2  Women   0      0     1      0
#> V3    Man   1      0     0      0
#> V4 Undet.   0      1     0      0

# A character column (with separator)
frame2 <- data.frame(A = c("Man; Women", "Women; Women",
                         "Man; Man", "Undet.; Women; Man"))
dichotomize(frame2, "A", sep = "; ")
#>                     A Man Women Undet. A:None
#> V1         Man; Women   1     1      0      0
#> V2       Women; Women   0     1      0      0
#> V3           Man; Man   1     0      0      0
#> V4 Undet.; Women; Man   1     1      1      0

# A character column and another factor column (same sepatator)
frame3 <- data.frame(A = c("Man; Women", "Women; Women",
                         "Man; Man", "Undet.; Women; Man"),
                     C = factor(c(1:4), labels = c("Paris", "New York",
                         "London; New York", "<NA>")))
dichotomize(frame3, c("A", "C"), sep = "; ")
#>                     A                C A:Man A:Women A:Undet. A:None C:New York
#> V1         Man; Women            Paris     1       1        0      0          0
#> V2       Women; Women         New York     0       1        0      0          1
#> V3           Man; Man London; New York     1       0        0      0          1
#> V4 Undet.; Women; Man             <NA>     1       1        1      0         NA
#>    C:None C:London C:Paris
#> V1      0        0       1
#> V2      0        0       0
#> V3      0        1       0
#> V4     NA       NA      NA

# A set of simple character or factor (same levels) variables. 
# In this case, you must use "C" separator.
frame4 <- data.frame(A = c("Man", "Women","Man", "Undet",NA),
                     B = c("Women","Women","Man","Women",NA),
           C = c(NA,NA,NA,"Man",NA))
dichotomize(frame4,c("A","B","C"), sep="C")
#>        A     B    C Man Women None Undet String:None
#> V1   Man Women <NA>   1     1    0     0           0
#> V2 Women Women <NA>   0     1    0     0           0
#> V3   Man   Man <NA>   1     0    0     0           0
#> V4 Undet Women  Man   1     1    0     1           0
#> V5  <NA>  <NA> <NA>   0     0    1     0           0