Chatbot Arena LLMList:The hyperlink login is visible.
Chatbot Arena is an open platform for crowdsourcing AI benchmarking, developed by researchers at SkyLab and LMArena at the University of California, Berkeley. With over 1,000,000 user votes, the platform uses the Bradley-Terry model to generate real-time leaderboards that rank the best LLMs and AI chatbots. For technical details, check out our paper.
LiveBench:The hyperlink login is visible.
Benchmarks designed specifically for LLMs, designed with test set contamination and objective evaluation in mind.
SuperCLUEOverall Leaderboard:The hyperlink login is visible.
CLUE positioning: In order to better serve Chinese language understanding, tasks and the industry, as a supplement to general language model evaluation, improve the infrastructure by collecting, sorting and publishing Chinese tasks and standardized assessments, and ultimately promote the development of Chinese NLP.
Open LLM Leaderboard:The hyperlink login is visible.
Compare large language models in an open and repeatable way
Comparison of large model evaluation benchmarks and performance:The hyperlink login is visible.
This page shows the performance of multiple mainstream large models on various evaluation benchmarks, including MMLU, GSM8K, HumanEval and other standard datasets. We help developers and researchers understand the performance of different large models in various tasks through real-time updated evaluation results. Users can choose to compare custom models with evaluation benchmarks to quickly obtain the advantages and disadvantages of different models in practical applications.
|