Comments
There's unfortunately not much to read here yet...
Follow the full discussion on Reddit.
Hi guys, I have made some modifications to the Llama2 repository to utilize the TPU v3-8 hardware, so it can perform Llama2 7B (and even 13B) chat completion inference without graph recompilation. It is still slower than the Nvidia P100 when generating text with batch-size 1, not suitable for real-time inference but (TPU being TPU) shines well with batched text generation. I used it to generate large amount of texts for research purpose. Hope it benefits the community.
There's unfortunately not much to read here yet...
Ever having issues keeping up with everything that's going on in Machine Learning? That's where we help. We're sending out a weekly digest, highlighting the Best of Machine Learning.
Discover the best guides, books, papers and news in Machine Learning, once per week.