This article is a mirror article of machine translation, please click here to jump to the original article.

View: 1600|Reply: 1

[AI] (1) Open source large model rankings

[Copy link]
Posted on 2024-12-28 10:03:05 | | | |
Chatbot Arena LLMList:The hyperlink login is visible.

Chatbot Arena is an open platform for crowdsourcing AI benchmarking, developed by researchers at SkyLab and LMArena at the University of California, Berkeley. With over 1,000,000 user votes, the platform uses the Bradley-Terry model to generate real-time leaderboards that rank the best LLMs and AI chatbots. For technical details, check out our paper.



LiveBenchThe hyperlink login is visible.

Benchmarks designed specifically for LLMs, designed with test set contamination and objective evaluation in mind.



SuperCLUEOverall Leaderboard:The hyperlink login is visible.

CLUE positioning: In order to better serve Chinese language understanding, tasks and the industry, as a supplement to general language model evaluation, improve the infrastructure by collecting, sorting and publishing Chinese tasks and standardized assessments, and ultimately promote the development of Chinese NLP.



Open LLM LeaderboardThe hyperlink login is visible.

Compare large language models in an open and repeatable way



Comparison of large model evaluation benchmarks and performanceThe hyperlink login is visible.

This page shows the performance of multiple mainstream large models on various evaluation benchmarks, including MMLU, GSM8K, HumanEval and other standard datasets. We help developers and researchers understand the performance of different large models in various tasks through real-time updated evaluation results. Users can choose to compare custom models with evaluation benchmarks to quickly obtain the advantages and disadvantages of different models in practical applications.





Previous:Copy the blob image to the clipboard
Next:Angular Removal Zone.js Explore new Zoneless solutions
 Landlord| Posted on 2024-12-28 10:20:27 |
Qwen is a series of large language models and large multimodal models developed by the Qwen team of Alibaba Group.
Qwen2.5The hyperlink login is visible.

DeepSeek-V3 is a self-developed MoE model with 671B parameters, 37B activated, and pre-trained on 14.8T tokens.
DeepSeek-V3The hyperlink login is visible.

Zhipu is a company transformed from the technological achievements of the Department of Computer Science of Tsinghua University
GLM-4-9BThe hyperlink login is visible.

Disclaimer:
All software, programming materials or articles published by Code Farmer Network are only for learning and research purposes; The above content shall not be used for commercial or illegal purposes, otherwise, users shall bear all consequences. The information on this site comes from the Internet, and copyright disputes have nothing to do with this site. You must completely delete the above content from your computer within 24 hours of downloading. If you like the program, please support genuine software, purchase registration, and get better genuine services. If there is any infringement, please contact us by email.

Mail To:help@itsvse.com