The following page may contain information related to upcoming products, features and functionality. It is important to note that the information presented is for informational purposes only, so please do not rely on the information for purchasing or planning purposes. Just like with all projects, the items mentioned on the page are subject to change or delay, and the development, release, and timing of any products, features or functionality remain at the sole discretion of GitLab Inc.
Stage | AI-powered |
Group | AI Model Validation |
Maturity | Available |
Content Last Reviewed | 2024-10-05 |
The AI Research category aims to identify and explore AI/ML models to support use cases other GitLab sections, stages, groups are developing to enrich the DevSecOps experience for GitLab users.
We continuously evaluate AI/ML model vendors, open source models, and generative AI foundation models. Models that show promising results from our initial research and exploration will be further tested via the AI Evaluation platform to be compared against models already supported in our AI Framework and that are actively powering GitLab Duo features including self-hosted models.
We evaluate models with a wide range of criteria to support our enterprise customer's needs including:
GitLab has built an advanced model evaluation platform that we call our prompt library. This contains thousands of human and synthetic generated prompts that we use to evaluate various AI models and different versions of AI models.
We use this suite of evals to run large scale testing of AI model quality output against both human and synthetic generated benchmarks to evaluate model output quality. We leverage techniques like cosign similarity, embedding similarity, and LLM evaluators (LLMs evaluating other LLM outputs). While no one technique is perfect, we leverage a blend of these techniques to compare quality of different models and versions.
This system allows GitLab to evaluate both new model versions but also updated model versions. We've already leveraged this system to catch issues with model updates pushed by our AI vendors. In some cases it has even detected model drift in models that our vendors did not anticipate or communicate with GitLab.
While our AI Evaluation suite provides a point in time comparison, we are working to automate testing to run the entire suite against models regularly to detect drift and model regressions continuously.