Setting up an iOS app with Gemma 3n

Preparation

We were ready to build a brand new SwiftUI iOS app. Before we began, we had to set up the Xcode framework within the app. To start, we created a new SwiftUI project in Xcode by navigating to Xcode → File → New → Project, selecting SwiftUI as the primary UI framework, and creating a new app

Next, we had to add the Xcode framework we built earlier to the project. This can be done easily by simply dragging and dropping the framework into our app. Once that's done, we could move on to integrating the necessary controllers into the app

App setup

To work successfully with Gemma 3n, our SwiftUI iOS app requires two key controllers: LlamaState and LibLlama. Both can be found in llama.cpp/examples/llama.swiftui:

LlamaState - acts as a bridge between the SwiftUI app and llama.cpp, using LibLlama
LibLlama - serves as the core engine that manages LLM setup within the SwiftUI app

After adding these controllers to our SwiftUI project, we were ready to begin designing the app's user interface

Additional features to controllers

In addition to adding these controllers to the project, we also needed to modify them to ensure they functioned correctly

First things first, we had to add these lines of code into LibLlama:

func clear() {
    tokens_list.removeAll()
    temporary_invalid_cchars.removeAll()
    llama_memory_clear(llama_get_memory(context), true)

    self.n_cur = 0 // <- add this line
    self.is_done = false // <- add this line
}

Without these lines of code, Gemma 3n won't respond to a second prompt. To clarify: the first prompt works as expected and receives a response, but the second prompt fails because the session cache isn't cleared between prompts.

static func create_context(path: String) throws -> LlamaContext {
    llama_backend_init()
    var model_params = llama_model_default_params() // <- add this line

#if targetEnvironment(simulator)
    model_params.n_gpu_layers = 0
    print("Running on simulator, force use n_gpu_layers = 0")
#endif
    model_params.n_gpu_layers = 0 // <- add this line

    let model = llama_model_load_from_file(path, model_params)
    guard let model else {
        print("Could not load model at \(path)")
        throw LlamaError.couldNotInitializeContext
    }

    let n_threads = max(1, min(8, ProcessInfo.processInfo.processorCount - 2))
    print("Using \(n_threads) threads")

    var ctx_params = llama_context_default_params()
    ctx_params.n_ctx = 2048
    ctx_params.n_threads       = Int32(n_threads)
    ctx_params.n_threads_batch = Int32(n_threads)

    let context = llama_init_from_model(model, ctx_params)
    guard let context else {
        print("Could not load context!")
        throw LlamaError.couldNotInitializeContext
    }

    return LlamaContext(model: model, context: context)
}

Without these lines of code, the model won't load on physical devices. It may run successfully from within Xcode, it will fail to work — such as when distributed via TestFlight — on any actual device though

Other necessary settings and methods can be found on Github repo of Gemergency

Further steps

With that completed, we proceeded to develop the Gemergency iOS app. The next steps involved designing the UI with SwiftUI, integrating iOS system features, and implementing other core functionalities

Keyboard shortcuts

Gemergency iOS app docs

Setting up an iOS app with Gemma 3n

Preparation

App setup

Additional features to controllers

Further steps