Yesterday, Apple made a significant announcement in the field of artificial intelligence (AI) by releasing several open-source large language models (LLMs) that are designed to run on-device rather than through cloud servers. The new models, called OpenELM (Open-source Efficient Language Models), are available on the Hugging Face Hub, a community for sharing AI code.
In total, there are eight OpenELM models, four of which were pre-trained using the CoreNet library, and four instruction-tuned models. Apple’s researchers used a layer-wise scaling strategy to improve accuracy and efficiency. In addition to the final trained models, Apple has also provided code, training logs, and multiple versions of the models. This approach is aimed at promoting faster progress and “more trustworthy results” in the natural language AI field.
Apple’s decision to release the OpenELM models is part of a broader effort to “empower and enrich the open research community” with state-of-the-art language models. By sharing open-source models, researchers are able to investigate risks and data and model biases. Developers and companies, on the other hand, can use the models as-is or make modifications as needed.
The open sharing of information has also become an important tool for Apple to recruit top engineers, scientists, and experts. By providing opportunities for research papers that would not normally have been able to be published under Apple’s secretive policies, the company is able to attract top talent in the AI field.
While Apple has not yet brought these kinds of AI capabilities to its devices, there are rumors that iOS 18 will include a number of new AI features. In particular, there are indications that Apple is planning to run its large language models on-device for privacy purposes. This approach would allow users to benefit from the power of AI without having to share their data with cloud servers.
The white paper outlined:
The reproducibility and transparency of large language models are crucial for advancing open research, ensuring the trustworthiness of results, and enabling investigations into data and model biases, as well as potential risks. To this end, we release OpenELM, a state-of-the-art open language model. OpenELM uses a layer-wise scaling strategy to efficiently allocate parameters within each layer of the transformer model, leading to enhanced accuracy. For example, with a parameter budget of approximately one billion parameters, OpenELM exhibits a 2.36% improvement in accuracy compared to OLMo while requiring 2times fewer pre-training tokens. Diverging from prior practices that only provide model weights and inference code, and pre-train on private datasets, our release includes the complete framework for training and evaluation of the language model on publicly available datasets, including training logs, multiple checkpoints, and pre-training configurations. We also release code to convert models to MLX library for inference and fine-tuning on Apple devices. This comprehensive release aims to empower and strengthen the open research community, paving the way for future open research endeavors.
Discover more from GadgetBond
Subscribe to get the latest posts sent to your email.
