Data Phoenix Digest - ISSUE 9.2023

Hey folks,

Welcome to this week's edition of Data Phoenix Digest! Today we will introduce you to the exciting list of Data Phoenix community webinars that our team has prepared for you.

Be active in our community and join our Slack to discuss the latest news of our community, top research papers, articles, events, jobs, and more...

Join our Slack

📣

Want to promote your company, conference, job, or event to the Data Phoenix community of Data & AI researchers and engineers? Click here for details.

Upcoming Data Phoenix webinars

LLM Evaluations - What and Why

Large language models are trained on billions of data points and perform exceptionally well across a wide range of tasks. However, one aspect where these models often fall short is their lack of determinism. While building a prototype of an LLM application has become remarkably easy, transforming that prototype into a fully-fledged product is equally challenging. Even with carefully crafted prompts, the model can exhibit problematic behavior such as hallucinations, incorrect output structures, toxic or biased responses, or irrelevant replies for certain inputs. The potential error modes can be extensive.

This is where a robust LLM evaluation tool like UpTrain comes to the rescue which empowers you to:

Validate and correct the model's responses before presenting them to end-users.
Obtain quantitative measures for experimenting with multiple prompts, model providers, and more.
Conduct unit testing to ensure that no faulty prompts or code make their way into your production environment.

Join us for an insightful talk as we delve deep into the intricacies of assessing the performance and quality of LLMs and discover the best practices to ensure the reliability and accuracy of your LLM applications.

Multilingual Semantic Search

Connecting Large Language Models with embeddings and semantic search on your own data has become widely popular. But how does this work in other languages and across languages? Join me for this talk why multilingual semantic search is amazing, how respective models are trained, and new use-cases this unlocks.

Rise in the use of synthetic data for regulated industries

Synthetic data is evolving and becoming extremely important for organizations. This session will uncover facts about synthetic data. It will also talk about some of the most impactful use cases associated with it, along with challenges that companies face while harnessing its power.

How to use LLMs to Interface with Multiple Data Sources

Following emerging Large Language Model Operations (LLM Ops) best practices in the industry, you’ll learn about the key technologies that enable Generative AI practitioners like you to build complex LLM applications. Specifically, we’ll deep dive on “data frameworks” like LlamaIndex, and we’ll demonstrate how to create state-of-the-art hierarchical indexes from different data sources. During the event, we will also show you how another commonly known LLM Ops framework (LangChain) underlies much of the functionality of LlamaIndex. All demo code will be provided via GitHub links during and after the event!

Best practices for building LLM-based applications

Many businesses started incorporating Large Language Models into their applications. There are, however, several challenges that may impact such systems. It’s great to be aware of them before you start. During the talk, we will review the existing tools and see how to move from development to production without a headache.

Leveraging Large Language Models for Enterprise Usage

Organizations worldwide are still trying to understand how to leverage generative AI models and put them into practical use. To enable them, NVIDIA developed a full-stack approach, from the hardware to develop and serve these models, to the variety of customizable SDKs and services to assist research and industry alike. However, LLMs, like any other technology, are not perfect and require guardrails to address shortcomings such as hallucination, inherited bias, and toxicity. By providing toolsets and mechanisms to mitigate these limitations, in the roads ahead, we hope to see generative AI open up new horizons and brings about positive revolution. Join this talk to learn about foundation and ChatGPT-style models, generative AI and LLM technology at NVIDIA, shortcomings and proposed guardrails, and the road ahead.

Video records of past Data Phoenix webinars

Building production ready LLMs with specialisation

LLM inference can be very expensive, requiring access to powerful GPUs. In this talk, Meryem discusses ways to reduce this cost by over 90% through better choice of model, hardware, and model compression techniques. This is an essential talk to go to for anyone looking to put LLM into production.

Watch Video

Unlocking Data Value with Large Language Models

Large Language Models or Foundation Models are the ones that power Generative AI applications. FMs challenge classical Machine Learning with a paradigm shift towards Prompt Engineering which is the new way of building ML applications for businesses. In this talk we will discuss how businesses can leverage FMs using Prompt Engineering and build Generative AI application in the cloud. We will also go over the architectural components and resources on how to get started alongside how much does it cost.

Watch Video

📣

Don't miss out! Subscribe to our YouTube channel now and be the first to receive notifications about the video records of past events and other valuable content to help you stay ahead!

Subscribe

Data Phoenix Digest - ISSUE 9.2023

Upcoming Data Phoenix webinars

LLM Evaluations - What and Why

Multilingual Semantic Search

Rise in the use of synthetic data for regulated industries

How to use LLMs to Interface with Multiple Data Sources

Best practices for building LLM-based applications

Leveraging Large Language Models for Enterprise Usage

Video records of past Data Phoenix webinars

Building production ready LLMs with specialisation

Unlocking Data Value with Large Language Models

Comments

Read Next

Nscale announces Europe's largest Series C, Sheryl Sandberg and Nick Clegg join its board

Replit launches Agent 4 as part of its mission to make software development widely accessible

AI translations are flooding Wikipedia articles with errors and hallucinations

GPT-5.3 Instant will hopefully stop telling users they are not broken and that they need to calm down

Amazon, NVIDIA and SoftBank are all part of OpenAI's recent $110B private raising effort