Inference setup

Integration choice

In the beginning, when it comes to building an iOS app with LLM, the developer needs to choose the way it will be integrated in the app. In our case, there were standard ways of using that on-device:

coremltools from pip
llama.cpp inference with .gguf file extension
Google's MediaPipe
Use of ONNX

After some time of working with all these methods, we came across on pros/cons of each of those ways:

	coremltools	llama.cpp	MediaPipe	ONNX
Pros	Easily integrated via Apple's CoreML	A developer can gain access to lower-level settings	Standard way of integrating Google's LLMs	Use with coremltools by running just one command
Cons	Not supported for now [08/03/2025]	Too hard war for noobs	Google Gemma 3n is not supported for now [08/03/2025]	Need for high-performed Mac 16+ of RAM and Apple Silicon Pro+ processors

Unfortunately, we couldn't use coremltools or ONNX, which are considered as the best tools for using LLMs on iOS, so we narrowed such tools down to llama.cpp and MediaPipe. And, as it often happens, MediaPipe became not appropriate for us because we realized that there is no way to convert Google Gemma 3n into .task file extension. Hence, the only thing we could try is llama.cpp

We are going through each LLM integration step in the Gemergency iOS app. We first start with llama.cpp setup and finally go to building our own SwiftUI iOS app

llama.cpp setup

First things first, we had to install llama.cpp inference on macOS. For this, we need to clone the official repo on the Mac:

$ git clone --recursive https://github.com/ggml-org/llama.cpp.git && cd llama.cpp

By running that command, we clone and go to the root directory of llama.cpp. We can find example/llama.swiftui subdirectory there. This is what we need. But before going there, we have to build Xcode framework for further use in SwiftUI iOS app. Run this command in the root directory of llama.cpp:

$ ./build-xcframework.sh

And that's it! We can not proceed by integrating Google Gemma 3n into the iOS app

Keyboard shortcuts

Gemergency iOS app docs

Inference setup

Integration choice

llama.cpp setup