[AI] (1) Open source large model rankings

Little scum · Posted on 12/28/2024 10:03:05 AM

Chatbot Arena LLMList:The hyperlink login is visible.

Chatbot Arena is an open platform for crowdsourcing AI benchmarking, developed by researchers at SkyLab and LMArena at the University of California, Berkeley. With over 1,000,000 user votes, the platform uses the Bradley-Terry model to generate real-time leaderboards that rank the best LLMs and AI chatbots. For technical details, check out our paper.

LiveBench：The hyperlink login is visible.

Benchmarks designed specifically for LLMs, designed with test set contamination and objective evaluation in mind.

SuperCLUEOverall Leaderboard:The hyperlink login is visible.

CLUE positioning: In order to better serve Chinese language understanding, tasks and the industry, as a supplement to general language model evaluation, improve the infrastructure by collecting, sorting and publishing Chinese tasks and standardized assessments, and ultimately promote the development of Chinese NLP.

Open LLM Leaderboard：The hyperlink login is visible.

Compare large language models in an open and repeatable way

Comparison of large model evaluation benchmarks and performance：The hyperlink login is visible.

This page shows the performance of multiple mainstream large models on various evaluation benchmarks, including MMLU, GSM8K, HumanEval and other standard datasets. We help developers and researchers understand the performance of different large models in various tasks through real-time updated evaluation results. Users can choose to compare custom models with evaluation benchmarks to quickly obtain the advantages and disadvantages of different models in practical applications.

Little scum · Posted on 12/28/2024 10:20:27 AM

Qwen is a series of large language models and large multimodal models developed by the Qwen team of Alibaba Group.
Qwen2.5：The hyperlink login is visible.

DeepSeek-V3 is a self-developed MoE model with 671B parameters, 37B activated, and pre-trained on 14.8T tokens.
DeepSeek-V3：The hyperlink login is visible.

Zhipu is a company transformed from the technological achievements of the Department of Computer Science of Tsinghua University
GLM-4-9B：The hyperlink login is visible.

[AI] (1) Open source large model rankings

Related Posts

Sections viewed