Nucleosome positioning data
The nucleosome positioning data set is a set of vectors with human nucleosome positioning signals around transcription starting sites (TSSs).
From the GENCODE database, version 7 (Harrow, et al., 2012), we obtained human nucleosome positioning signals collected using MNase-sequencing and TSSs on the human reference genome hg19. We repeated the process of merging neighboring TSSs within 1000 bp into a group, and we selected representative TSSs whose expression levels were maximal in individual groups. From the representative TSSs, we excluded those having any other TSSs within 1000 bp on the reverse strand to eliminate their effect. Subsequently, from the nucleosome positioning signal data, we generated 56,772 vectors of dimension 101,201,501,1001 and 2001 such that their elements were within 50,100,250,500 and 1000 bp around representative TSSs and more than half of the elements were nonzero.
We also generated vectors of smaller dimension d = 5, 10, 20 and 50. For d=10, 20 and 50, we selected every (2000/d)-th element around TSS; e.g., elements at -1000, -800, -600, ..., 0, ..., +600, and +800 bp for d=10. For d=5 we selected every 500-th element around TSS; e.g., elements at -1000, -500, 0, +500, and +1000 bp around representative TSSs. We excluded representative TSSs more than half of elements were nonzero in small dimension, and the nuber of TSSs in d=5, 10, 20 and 50 is 55,587.