Meta推出了开源的LLaMA,本篇介绍CPU版的部署方式,依赖简单且无需禁运的A100显卡。

运行环境

OS: Ubuntu 22.04

CPU: Ryzen 5 5600X

编译llama.cpp

官方的LLaMA需要大显存显卡,而魔改版的llama.cpp只需大内存即可。

1
2
3
git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp
make

转换模型

我用的是conda进行python环境管理,如何安装conda

创建并激活环境

1
2
3
conda create -n pt-cpu python=3.10
conda activate pt-cpu
conda install pytorch torchvision torchaudio cpuonly -c pytorch sentencepiece

获取模型文件可通过这里申请,不想申请也有捷径

在项目根目录新建models文件夹,并把模型放进去,其结构如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
models/
├── 13B
│   ├── checklist.chk
│   ├── consolidated.00.pth
│   ├── consolidated.01.pth
│   └── params.json
├── 30B
│   ├── checklist.chk
│   ├── consolidated.00.pth
│   ├── consolidated.01.pth
│   ├── consolidated.02.pth
│   ├── consolidated.03.pth
│   └── params.json
├── 65B
│   ├── checklist.chk
│   ├── consolidated.00.pth
│   ├── consolidated.01.pth
│   ├── consolidated.02.pth
│   ├── consolidated.03.pth
│   ├── consolidated.04.pth
│   ├── consolidated.05.pth
│   ├── consolidated.06.pth
│   ├── consolidated.07.pth
│   └── params.json
├── 7B
│   ├── checklist.chk
│   ├── consolidated.00.pth
│   └── params.json
├── tokenizer_checklist.chk
└── tokenizer.model

转换7B模型。

1
2
python convert-pth-to-ggml.py models/7B/ 1
./quantize.sh 7B

最后生成ggml-model-q4_0.bin

转换13B模型。

1
2
3
python convert-pth-to-ggml.py models/13B/ 1
./quantize ./models/13B/ggml-model-f16.bin ./models/13B/ggml-model-q4_0.bin 2
./quantize ./models/13B/ggml-model-f16.bin.1 ./models/13B/ggml-model-q4_0.bin.1 2

最后生成ggml-model-q4_0.binggml-model-q4_0.bin.1

运行

注意LLaMA基于GPT-3,但并不是像ChatGPT(GPT3.5)那样的类InstructGPT——经过指示学习和人工反馈的强化学习的微调。所以并不能直接问问题,而需要合适的吟唱咒语。可参考官方的Prompt Tips

7B模型:

1
./main -m ./models/7B/ggml-model-q4_0.bin -p "Simply put, quantum entanglement states that" -t 6 -n 512

内存占用约4.7G。

13B模型:

1
./main -m ./models/13B/ggml-model-q4_0.bin -p "Simply put, quantum entanglement states that" -t 6 -n 512

内存占用约8.4G。

输出如下:

1
2
3
4
5
6
 Simply put, quantum entanglement states that a pair of interacting particles can be described in terms as if the two were one single system or particle. This is what I mean by “entangled”:
Entangle two photons with each other and they will exhibit correlated behavior such that no matter how far apart they are, their actions affect (and are affected) instantaneously, even across vast distances in space-time . If we separate the entangles pair of particles spatially by some distance then any change to one particle’s state would be reflected with equal rapidity on its partner.
These two “entangled” photons can also interact and influence each other without being limited physically or electromagnetic in nature, meaning they do not have to share the same space-time coordinates (like a car wreck for example) nor are there any limitations imposed by distance between them; this means that information exchange could occur faster than light speed.
This is also known as quantum teleportation because of it’s ability to instantly transport the behavior or even matter across vast distances, although you can only move photons and not entire objects at once like a car wreck for example (those are called BH-entangled). This means that information exchange could occur faster than light speed.
Now here is where it gets really strange; because of the entangle two particles share their quantum state, no matter how far apart they may be in space and time at any given moment when one particle changes its behavior or goes through a change, then so does the other as if there were only “one” system interacting. If we separate these two systems by some distance from each others field of effect/influence (say for example: 1 light year apart) and they go their own ways to do something different than what was originally entangled, then at any moment the instantaneous change is reflected in both particles as if it were a “single system”.
This means that one particle has all of its information contained within itself but also includes a copy or replicate (if you will) of everything about every other subatomic constituent and elemental component inside another entangled pair; so to speak, the universe exists in multiples at once! It could be said that any given individual particle is not only an independent entity by definition it’s also like a hologram for all others because its entire history up until now will always remain part

说了一堆,遣词造句上没什么问题,但一眼就看出有个重大错误:纠缠的粒子对坍缩是超距的,但并不能利用这个来瞬时传信息,即信息的传输依然无法超越光速。

LLaMA的训练集是20种语言的维基百科,并不包含中文,但我还是手贱试了下:

2023-03-18_23-29

驴唇不对马嘴,不知所云还有错别字。

参考资料

FAQ