Running Llama 3 on i3 10th Gen

(Without separate GPU)

Recently I moved from windows 11 to Ubuntu.

the last time i had i had run llama 2 on windows via wsl, it was painfully slow, generating about 1 word every 20-30 seconds or so.

Now, I am in Ubuntu, LINUX

of course, now that i finally got bugged by windows to the extent of switching my OS.

This mainly happened due to the reason that WINDOWS IS HEAVY.

so out of 8GB ram, about 5 used to be eaten up just for doing nothing, by "nothing", I mean nothing heavy. Around 5-6 chrome tabs, QuickLook open, and maybe file explorer and of course task manager to monitor.

QuickLook did not eat up much, probably few negligible MBs, but point is windows is also getting unwanted un-privacy focused features. I am not so private you might think, but i don't want a new LLM being trained on my data, without my permission.

SO, after Copilot, i decided to use windows, and eventually after the release of Copilot+ PCs, and new updates, I just decided to just dump windows and get something more happy with my computer, not teasing it in a way by heavy unwanted junk (mainly) + PRIVACY.

Finally in Ubuntu, while doing normal tasks - browsing, files, messaging app, office suite, image, settings, terminal, etc... i tested Llama 3. I WAS BORED AND WAS DOING RANDOM THINGS, of course anyone would want to tinker all the time in linux - unless its production grade...

and by luck, it was blazingly fast when compared to the windows runtime...

now, i am getting about page long answers in about 5-10 minutes, which is slower than chat gpt, or bard or copilot or whatsoever that is cloud based, but the main idea is speed with privacy, best of both worlds. its not the fastest, maybe like network chuck's terry or whatsoever, but look its an i3 10th Gen

Now I've bored you enough... let me walk you through installing it.

STEP 1 GO TO https://www.ollama.com/download

ok - just follow the copy paste in terminal and you are practically done installing Ollama

STEP 2 - run llama3

just run in terminal - ollama run llama3

and there you are it will download the model and run it

This took 3 minutes and 20 seconds which is SLOW but OK for an i3 with a small GPU.

and here are the system stats while it was running

Conclusion

Even though this is slow for practical uses, running light-weight models such as the Phi-3 Mini, especially for RAG, is useful

SUNBIRD-SAMEER

Search This Blog