Salesforce Introduces XGen-7B, a new 7B Large Language Models (LLMs)

US-based Software-as-a-service (SaaS) giant Salesforce has recently introduced a XGen-7B, a series of 7B Large Language Models (LLMs) trained on 8K input sequence length.

Salesforce Introduces XGen-7B, a new 7B Large Language Models (LLMs)

US-based Software-as-a-service (SaaS) giant Salesforce has recently introduced a XGen-7B, a series of 7B Large Language Models (LLMs) trained on 8K input sequence length.

The models are released under the Apache 2.0 licences.

On standard NLP benchmarks, XGen achieves comparable or better results when compared with other open-source LLMs such as Falcon, LLaMA, Redpajama, and OpenLLaMA, among others.

We trained a series of 7B LLMs named XGen-7B with standard dense attention on up to 8K sequence length for up to 1.5T tokens. We also fine tune the models on public-domain instructional data. The main take-aways are:

* On standard NLP benchmarks, XGen achieves comparable or better results when compared with state-of-the-art open-source LLMs (e.g. MPT, Falcon, LLaMA, Redpajama, OpenLLaMA) of similar model size.

* Our targeted evaluation on long sequence modeling benchmarks show benefits of our 8K-seq models over 2K- and 4K-seq models.

* XGen-7B archives equally strong results both in text (e.g., MMLU, QA) and code (HumanEval) tasks.

* Training cost of $150K on 1T tokens under Google Cloud pricing for TPU-v4.

Why XGen-7B with 8K Sequence Length :

As LLMs become ubiquitous, their applications to long sequences have been a key focus, especially for applications like summarizing text (potentially interleaved with other data sources like tables and images), writing code, and predicting protein sequences, which require the model to effectively consider long distance structural dependencies. A large context allows a pre-trained LLM to look at customer data (e.g., documents the LLM did not use in training) and responds to useful information seeking queries.

Yet, most open-source LLMs (e.g., LLaMA, MPT, Falcon) have been trained with a maximum of 2K token sequence length, which is a key limitation in modeling long sequences. Inference time solutions such as ALiBi have yet to be evaluated for larger models (e.g. MPT-7b-StoryWriter-65k+). Recent work on model scaling has shown that for a given compute budget, the best performances are not necessarily achieved by the largest models, but by smaller models trained on more data (measured by number of tokens). A smaller model is also generally preferred for inference efficiency during serving including on-device serving. In light of this, we train a series of 7B LLMs named XGen with standard dense attention on up to 8K sequence length for up to 1.5T tokens. We also fine tune the XGen models on public-domain instructional data, creating their instruction-tuned counterparts (XGen-7B-inst).

Model	Description
XGen-7B-4K-base	We train for 800B tokens with a sequence length of 2k tokens first, then for another 400B tokens (total 1.2T tokens) with 4k. Released under Apache-2.0.
XGen-7B-8K-base	Initialized with XGen-7B-4K-base and further trained for 300B more tokens (total 1.5T tokens) with 8K sequence length. Released under Apache-2.0.
XGen-7B-{4K,8K}-inst	Supervised fine tuned on public domain instructional data including databricks-dolly-15k, oasst1, Baize and GPT-related datasets. Released for research purpose only.

Codebase: https://github.com/salesforce/xGen

Blog : https://huggingface.co/Salesforce/xgen-7b-8k-base

Note : This news is only for students, for the purpose of enhancing their knowledge. This news is collected from several companies, the copyrights of this news also belong to those companies like : BBC, CNN, Times of India, Reuters, The Verge, Indian Express, Tech Crunch, News18, Mint, Hindustan Times, Business Today, Techgig etc,.