Databricks Releases Dolly 2.0: The Open LLM for Commercial Use

Name: Viktor Zinchenko

Updated on 7/24/2023

Databricks has recently released Dolly 2.0, the first open, instruction-following LLM for commercial use. This groundbreaking development in AI technology has the potential to change the game in natural language instruction processing.

What is Dolly 2.0?

Dolly 2.0 is an instruction-following large language model trained on the Databricks machine-learning platform that is licensed for commercial use. It is based on Pythia-12b and is trained on ~15k instruction/response fine-tuning records generated by Databricks employees in various capability domains, including brainstorming, classification, closed QA, generation, information extraction, open QA, and summarization.

How Does Dolly 2.0 Work?

Dolly 2.0 works by processing natural language instructions and generating responses that follow the given instructions. It can be used for a wide range of applications, including closed question-answering, summarization, and generation.

To use Dolly 2.0, one needs to have the Transformers and Accelerate libraries installed. The instruction following pipeline can then be loaded using the pipeline function. The pipeline can be used to answer instructions and generate text.

Benefits of Dolly 2.0

Dolly 2.0's accuracy and efficiency make it an invaluable tool for businesses and developers looking to process natural language instructions quickly and accurately. Its open-source nature also promotes innovation in the AI industry and encourages the development of ethical and responsible AI technologies.

You can check out the Databricks Dolly-v2-12b repo on HugginFace (opens in a new tab).

Limitations of Dolly 2.0

Dolly 2.0 is not a state-of-the-art generative language model and is not designed to perform competitively with more modern model architectures or models subject to larger pre-training corpuses. It also struggles with syntactically complex prompts, programming problems, mathematical operations, factual errors, dates and times, open-ended question answering, hallucination, enumerating lists of a specific length, stylistic mimicry, and humor.

Additionally, Dolly 2.0's training data represents natural language instructions generated by Databricks employees during a period spanning March and April 2023 and includes passages from Wikipedia as reference passages for instruction categories like closed QA and summarization. While the dataset does not contain obscenity, intellectual property, or personally identifying information about non-public figures, it may contain typos and factual errors, and it likely reflects the interests and semantic choices of Databricks employees, a demographic that is not representative of the global population at large.

Conclusion

Dolly 2.0 is open, instruction-following LLM that has the potential to change the game in AI development. Its accuracy and efficiency make it an invaluable tool for businesses and developers looking to process natural language instructions quickly and accurately. Its open-source nature also promotes innovation in the AI industry and encourages the development of ethical and responsible AI technologies. While it may have some limitations, Dolly 2.0 is a significant step towards the development of open, instruction-following LLMs for commercial use, and it will undoubtedly play a vital role in shaping the future of AI.

Interested in AI? RATH (opens in a new tab) is working on a ChatGPT-based Data Analytics tool that could give you data insights with natural language. You can create beautiful charts and visualizations in no time!

Interested? Inspired? Unlock the insights of your data with one prompt: ChatGPT-powered RATH is Open for Beta Stage now! Get onboard and check it out!

(opens in a new tab)

📚

Top 7 Database Visualization Tools Every Data Enthusiast Must Know Databricks Visualization: Gaining Insights with Unified Data Analytics