PyData Miami 2022

State-of-the-art Text Mining with Spark NLP
09-22, 10:50–11:20 (US/Eastern), Main Room

State-of-the-art Text Mining with Spark NLP


This session introduces the text mining capabilities of the open-source Spark NLP library. Spark NLP provides state-of-the-art accuracy, speed, and scalability for language understanding by delivering production-grade implementations of recent research advances. Spark NLP is the most widely used NLP library in the enterprise today; provides thousands of current, supported, pre-trained models for 250+ languages out of the box; and is the only open-source NLP library that can natively scale to use any Apache Spark cluster.


Prior Knowledge Expected

Previous knowledge expected

David Talby is a chief technology officer at John Snow Labs, helping healthcare & life science companies put AI to good use. David is the creator of Spark NLP – the world’s most widely used natural language processing library in the enterprise. He has extensive experience building and running web-scale software platforms and teams – in startups, for Microsoft’s Bing in the US and Europe, and to scale Amazon’s financial systems in Seattle and the UK. David holds a PhD in computer science and master’s degrees in both computer science and business administration.