PyData Miami 2022
How to Enhance Your Machine Learning Models with Genetic Algorithms
State-of-the-art Text Mining with Spark NLP
Aleksandra Kalisz
Panel Discussion - Diversity in Data Science
This talk will overview how NLP is being used in research and industry to preserve at-risk languages, power technologies to solve pressing problems (like employment matches), and create culturally-attuned NLP tools (like sentiment analysis). Current challenges include data ownership and local population rights to their data. Examples come from partnerships in Sub-Saharan Africa, but they apply to other regions of the world, as well.
Noelle Silver Russell
Speed up your Machine Learning Applications with Lightning AI
MovingPandas is an open source python library for working with trajectory data. MovingPandas is an extremely useful tool for working with AIS data which represent the location of vessels. In this talk i'll present the methods and algorithms implemented in MovingPandas and discuss the insights it can derive for shipping companies, port operators and government agencies.
In this talk we'll see how to easily run your code at scale through Docker and AWS Batch. We'll cover how to start scaling your workloads once the laptop isn't enough and how open-source can help you achieve that. ML involves training at scale to get the best performance out of the model, and at times it requires heavy-duty GPUs, which requires infrastructure work, security, permissions and operations. We'll cover the steps to deploy it without the Ops, letting you as a data scientist to focus on the important task - getting the most out of the models!
In your current production environment/data science practice do you have applications that have low latency constraints? Do you have multiple team members working in different frameworks and deploying across multiple different inference servers? Are you still using Flask to deploy your models? If your answer is yes then you can leverage the capabilities of open source inference servers to standardize your model deployment on both CPUs and GPUs across different frameworks. In order to deploy more complex ensemble models but still maintain low latency, post-training model optimization is a key factor in reducing latency times. We will also look at how to do model ensembling across multiple framework backends (PyTorch, TensorFlow, Python) along with running multiple model copies using a single inference server instance.
Enterprise Search is a key use case in big data and business computing. In this talk we introduce Enterprise Semantic Search with Large Language Models (LLMs), and present a working demonstration in the financial domain. Semantic search is search based on meaning representations, instead of literal document and query keywords. We use the recent HuggingFace transformers library, together with related Python libraries (TensorFlow, sklearn and UMAP) for NLP and deep learning. Approaches, data visualization, metrics and datasets for search system evaluation are introduced. The talk will be of interest to developers working on text search and new unstructured data applications. Slides and a demo notebook will be available at the time of PyData Miami 2022.