4.5 times faster Hugging Face transformer inference by modifying some Python AST

Follow the full discussion on Reddit.
Recently, 🤗 Hugging Face people have released a commercial product called Infinity to perform inference with very high performance (aka very fast compared to Pytorch + FastAPI deployment). Unfortunately it’s a paid product costing 20K for one model deployed on a single machine (no info on price scaling publicly available) according to their product director.

Visit Website

Discover the Best of Machine Learning.

Ever having issues keeping up with everything that's going on in Machine Learning? That's where we help. We're sending out a weekly digest, highlighting the Best of Machine Learning.

4.5 times faster Hugging Face transformer inference by modifying some Python AST

Comments

Discover the Best of Machine Learning.