📄️ Input Format
The Input params are exactly the same as the OpenAI Create chat completion, and let you call 100+ models in the same format.
📄️ Output
Format
📄️ Streaming + Async
- Streaming Responses
📄️ Trimming Input Messages
Use litellm.trim_messages() to ensure messages does not exceed a model's token limit or specified max_tokens
📄️ Model Alias
The model name you show an end-user might be different from the one you pass to LiteLLM - e.g. Displaying GPT-3.5 while calling gpt-3.5-turbo-16k on the backend.
📄️ Reliability
Helper utils
📄️ Model Config
Model-specific changes can make our code complicated, making it harder to debug errors. Use model configs to simplify this.
📄️ Batching Completion() Calls
LiteLLM allows you to:
📄️ Mock Completion() Responses - Save Testing Costs 💰
For testing purposes, you can use completion() with mock_response to mock calling the completion endpoint.