13

I have a 10GB Data.Vector.Unboxed vector that I want to efficiently save to disk. What's the best, most efficient way? I plan to read it from a memory-mapped file too.

I have seen this package this package but only works with Storable but I need to stay with unboxed.

I was thinking of converting to a list but I am assuming this is not very ideal.

4
  • Why do you need to stay with Unboxed? As I've pointed out before, I'm not aware of any difference between the two.
    – crockeea
    Apr 24, 2014 at 1:19
  • I couldn't experiment with Storable because it doesn't support zip and my code relies on that. I could use zipWith but that involve major re-factoring.
    – jap
    Apr 24, 2014 at 8:20
  • Also one of the big difference is that you have to add storable instances everywhere like if I want to zip, I would need to add, for example, (Int, Int) instances and I would need to add a vast amount
    – jap
    Apr 24, 2014 at 8:49
  • Of course Storables support zipping: link. And just like you need Storable instances for new datas, you also need Unbox instances to use them with Unbox vectors. They're quite similar. If you're looking for a Storable tuple instance specifically, there's a package here.
    – crockeea
    Apr 24, 2014 at 12:45

3 Answers 3

5

You can convert between Vector types at the cost of an O(n) traversal of the entire vector. The function you're looking for is convert. As long as you're not planning to write this vector out to disk often, this cost should not be significant over all, and certainly faster than actually writing the vector out to disk. However, if you find yourself paying this cost often, you should probably rethink the algorithm.

2
  • I plan to write to disk once every day
    – jap
    Apr 23, 2014 at 16:17
  • Then my answer should be sufficient. It's one extra O(n) pass over the elements in the vector (note that this only deals with the references to the data, it's not touching all 10GB) right before you write. Apr 23, 2014 at 18:19
4

I haven't tested it myself, but you could try to use vector-binary-instances, which provides Binary instances for Vectors, and then use binary, for example encodeFile.

0
0

What about memory mapping the C array underlying the vector? Of course that works only if the Vector is unboxed :-).

Writing would then comprise of taking the pointer to the array, total C size of the array, and writing the C memory chunk with a single C call.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Not the answer you're looking for? Browse other questions tagged or ask your own question.