The entire dataset of 1.7M+ arXiv papers is now available on . We can't wait to see what the machine learning community will do with it!
. just released a dataset of all 1.7 million of its articles: More info: #opensource #nlp
Also available via arXiv OAI API (I don’t think you need to go via Kaggle if you don’t want to) ICYMI: It was described in an ICLR 2019 paper from a Googler, I wasn’t expecting data release at all actually!
Leveraging Machine Learning to Fuel New Discoveries with the arXiv Dataset - excellent news v