How to split a vector into sub-vectors of different lengths using R
Daniel James
I want to split a vector into sub-vectors of different lengths by stating the average length of the sub-vectors.
I was able to find an answer that only gave clues but not the answer I wanted
ts <- 1:23 # the parent vector
bs <- 3 # lenght of subvector
nb <- length(ts) / bs # number of subvector
split(ts, rep(1:nb, each=bs, length.out = length(ts)))
#$`1`
#[1] 1 2 3
#$`2`
#[1] 4 5 6
#$`3`
#[1] 7 8 9
#$`4`
#[1] 10 11 12
#$`5`
#[1] 13 14 15
#$ `6`
#[1] 16 17 18
#$ `7`
#[1]
what i want is
Average length is 4 Variance length is 2
the nature i want
#$`1`
#[1] 1 2
#$`2`
#[1] 3 4 5 6
#$`3`
#[1] 7 8 9
#$`4`
#[1] 10
#$`5`
#[1] 11 12 13 14 15 16
#$`6`
#[1] 17 18
#$`7`
#[1] 19 20 21 23 23
Jay
We can create a normally distributed density vector dens
with length(ts)
, mean 4
and variance 2
. From this, we calculate the probability probs
used to draw the random sample()
length s ts
. From this, we can sample bins
the desired length of split()
ts
. To ensure that the bins actually have the desired mean and variance , we can pack the whole thing into a repeat
loop until we all.equal()
have a specific tol
setting yield .TRUE
ts <- 1:23 # the parent vector
bs <- 3 # lenght of subvector
nb <- length(ts) / bs # number of subvector
set.seed(69429)
repeat {
dens <- dnorm(ts, mean=4, sd=sqrt(2)) # density
probs <- dens/sum(dens) # probabilities
sizes <- sample(length(ts), size=nb, replace=TRUE, prob=probs) # sample bin sizes
bins <- as.numeric(sort(factor(
sample(nb, length(ts), replace=TRUE), # sample bins
levels=1:length(ts))))
if (all.equal(c(mean(table(bins)), var(table(bins))), c(4, 2), tol=.1) == TRUE) {
break
}
}
bins
# [1] 1 1 1 1 2 2 2 2 3 3 3 3 3 3 4 4 4 5 5 6 6 6 6
Split
(S <- split(ts, as.numeric(bins)))
# $`1`
# [1] 1 2 3 4
#
# $`2`
# [1] 5 6 7 8
#
# $`3`
# [1] 9 10 11 12 13 14
#
# $`4`
# [1] 15 16 17
#
# $`5`
# [1] 18 19
#
# $`6`
# [1] 20 21 22 23
check
c(mean=mean(lengths(S)), var=var(lengths(S)))
# mean var
# 3.833333 1.766667