• ☆ Yσɠƚԋσʂ ☆@lemmy.mlOP
    link
    fedilink
    arrow-up
    5
    ·
    2 days ago

    The reason they ask for less money is due to the fact that it’s a more efficient algorithm, which means it uses less power. They leveraged mixture-of-experts architecture to get far better performance than traditional models. While it has 671 billion parameters overall, it only uses 37 billion at a time, making it very efficient. For comparison, Meta’s Llama3.1 uses 405 billion parameters used all at once. You can read all about here https://arxiv.org/abs/2405.04434