See Greenformer in action
Change the parameters and see the magic of Greenformer
Your typical modeling pipeline
Greenformer in your pipeline

from your_project import load_model, train, predict

model = load_model('transformer')

model = train(model)

score = predict(model)

from your_project import load_model, train, predict, get_submodules

from greenformer import auto_fact

model = load_model('transformer')

model = auto_fact(module=model, rank=64, solver='random', num_iter=50, submodules=None)

model = train(model)

submodules = get_submodules(model,'bert-base')

model = auto_fact(module=model, rank=64, solver='svd', num_iter=50, submodules=None)

score = predict(model)

Try Greenformer for Factorization-by-design on Google Colab
Accuracy
80.25%
GPU time (ms)
50s
CPU time (ms)
55s
Memory (MB)
100
Parameters (mio)
43
Accuracy
77.94%
GPU time (ms)
40s
CPU time (ms)
45s
Memory (MB)
100
Parameters (mio)
43
Next stop: Greenformer on Google Colab

Faster and memory efficient training without sacrificing performance

Factorization-by-design

Productionize a pre-trained model in a faster, lighter, and cheaper way

Post-training

More efficient few-shot in-context learning using billions parameters models

In-context learning
More about Greenformer
Greenformer is flexible, easy-to-use, and applicable for multiple scenarios.
Gain up to 1.5x speed increase while maintaining 93% of the performance by utilising Greenformer.
Performance and efficiency trade-off
Performance and efficiency trade-off of utilizing Greenformer on:
Left: factorization-by-design, Center: post-training factorization, Right: in-context learning factorization
Greenformer performs decomposition to the weight of linear and convolution layers within the models utilizing factorization solvers like random, SVD, or SMNF. It replaces: 1) the linear layers with Linear Encoder-Decoder (LED) layers and 2) the convolution layers with Convolution Encoder-Decoder (CED) layers. Greenformer supports factorization with either a predefined static rank or dynamic ranks across all layers by computing the rank based on a ratio to the maximum rank of the corresponding layer.
LED and CED layers in action
Left: Low-rank Encoder-Decoder (LED) layer, Right: Convolution Encoder-Decoder (CED) layer.