💡 Get 30 (free) AI project ideas: https://30aiprojects.com/
This is the 6th video in a series on using large language models (LLMs) in practice. Here, I review key aspects of developing a foundation LLM based on the development of models such as GPT-3, Llama, Falcon, and beyond.
More Resources:
▶️ Series Playlist: https://www.youtube.com/playlist?list=PLz-ep5RbHosU2hnz5ejezwaYpdMutMVB0📰 Read more: https://medium.com/towards-data-science/how-to-build-an-llm-from-scratch-8c477768f1f9?sk=18c351c5cae9ac89df682dd14736a9f3
[1] BloombergGPT: https://arxiv.org/pdf/2303.17564.pdf
[2] Llama 2: https://ai.meta.com/research/publications/llama-2-open-foundation-and-fine-tuned-chat-models/
[3] LLM Energy Costs: https://www.statista.com/statistics/1384401/energy-use-when-training-llm-models/
[4] arXiv:2005.14165 [cs.CL]
[5] Falcon 180b Blog: https://huggingface.co/blog/falcon-180b
[6] arXiv:2101.00027 [cs.CL]
[7] Alpaca Repo: https://github.com/gururise/AlpacaDataCleaned
[8] arXiv:2303.18223 [cs.CL]
[9] arXiv:2112.11446 [cs.CL]
[10] arXiv:1508.07909 [cs.CL]
[11] SentencePience: https://github.com/google/sentencepiece/tree/master
[12] Tokenizers Doc: https://huggingface.co/docs/tokenizers/quicktour
[13] arXiv:1706.03762 [cs.CL]
[14] Andrej Karpathy Lecture: https://www.youtube.com/watch?v=kCc8FmEb1nY&t=5307s
[15] Hugging Face NLP Course: https://huggingface.co/learn/nlp-course/chapter1/7?fw=pt
[16] arXiv:1810.04805 [cs.CL]
[17] arXiv:1910.13461 [cs.CL]
[18] arXiv:1603.05027 [cs.CV]
[19] arXiv:1607.06450 [stat.ML]
[20] arXiv:1803.02155 [cs.CL]
[21] arXiv:2203.15556 [cs.CL]
[22] Trained with Mixed Precision Nvidia: https://docs.nvidia.com/deeplearning/performance/mixed-precision-training/index.html
[23] DeepSpeed Doc: https://www.deepspeed.ai/training/
[24] https://paperswithcode.com/method/weight-decay
[25] https://towardsdatascience.com/what-is-gradient-clipping-b8e815cdfb48
[26] arXiv:2001.08361 [cs.LG]
[27] arXiv:1803.05457 [cs.AI]
[28] arXiv:1905.07830 [cs.CL]
[29] arXiv:2009.03300 [cs.CY]
[30] arXiv:2109.07958 [cs.CL]
[31] https://huggingface.co/blog/evaluating-mmlu-leaderboard
[32] https://www.cs.toronto.edu/~hinton/absps/JMLRdropout.pdf
—
Homepage: https://shawhintalebi.com/
Book a call: https://calendly.com/shawhintalebi
Intro – 0:00
How much does it cost? – 1:30
4 Key Steps – 3:55
Step 1: Data Curation – 4:19
1.1: Data Sources – 5:31
1.2: Data Diversity – 7:45
1.3: Data Preparation – 9:06
Step 2: Model Architecture (Transformers) – 13:17
2.1: 3 Types of Transformers – 15:13
2.2: Other Design Choices – 18:27
2.3: How big do I make it? – 22:45
Step 3: Training at Scale – 24:20
3.1: Training Stability – 26:52
3.2: Hyperparameters – 28:06
Step 4: Evaluation – 29:14
4.1: Multiple-choice Tasks – 30:22
4.2: Open-ended Tasks – 32:59
What’s next? – 34:31
source
Enroll Now - Limited Seats Available! -https://be10x.in/ This video is an honest and detailed Be10x…
Not your typical form builder. Forminator is the easy-to-use WordPress form plugin for every website…
Listen and subscribe to Stocks In Translation on Apple Podcasts, Spotify, or wherever you find…
New Method to Use Gemini Pro and Google VEO 3 for FREE Get ACCESS to…
"️🔥Purdue - Professional Certificate in AI and Machine Learning - https://www.simplilearn.com/pgp-ai-machine-learning-certification-training-course?utm_campaign=ve-Tj7kUemg&utm_medium=DescriptionFFF&utm_source=Youtube ️🔥IITK - Professional Certificate…
In this video, I’ll show you how to clone any website into WordPress for FREE…