pve显卡虚拟化
Tag pve, 显卡, 虚拟化, on by view 689

家里闲置一段时间了的x79洋垃圾被我安装pve当做服务器了,但是它上面插了一张 nvidia GTX1060 6G 显卡,一直没用上,最近看到云游戏架构介绍之后,了解到显卡也可以虚拟化,决定自己动手将这张显卡用起来,在pve上虚拟化,这样一来可以在linux系统使用同时也可以在windows系统使用,让显卡发挥它的作用,以免浪费。

环境介绍

  • pve: 7.3
  • 显卡: nvidia GTX1060 6G
  • 主机平台: x79 E5 双路, 64G内存

一开始我参考这篇文章,首先在宿主机上安装必须软件以及显卡驱动。注意其中显卡驱动和mdevctl是核心服务软件。

apt update && apt install dkms  git build-essential pve-kernel-5.15 pve-headers-5.15 dkms cargo jq uuid-runtime -y
wget -P /opt/ http://ftp.br.debian.org/debian/pool/main/m/mdevctl/mdevctl_0.81-1_all.deb
dpkg -i /opt/mdevctl_0.81-1_all.deb

然后,配置内核

echo vfio >> /etc/modules 
echo vfio_iommu_type1 >> /etc/modules 
echo vfio_pci >> /etc/modules 
echo vfio_virqfd >> /etc/modules
echo  "blacklist nouveau" >> /etc/modprobe.d/blacklist.conf
echo "options kvm ignore_msrs=1" > /etc/modprobe.d/kvm.conf 

# 更新 initramfs
update-initramfs -k all -u

配置引导

#编辑grub,请不要盲目改。根据自己的环境,选择设置
nano /etc/default/grub 
#在里面找到:
GRUB_CMDLINE_LINUX_DEFAULT="quiet" 
#然后修改为:
GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=on" 
#如果是amd cpu请改为:
GRUB_CMDLINE_LINUX_DEFAULT="quiet amd_iommu=on"
#更新引导
update-grub

重启宿主机一次

检查iommu是否开启成功

出现有如下iommu group说明成功
root@pve3:~# dmesg |grep iommu
[    0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-5.11.22-7-pve root=/dev/mapper/pve-root ro quiet iommu=pt intel_iommu=on
[    0.075784] Kernel command line: BOOT_IMAGE=/boot/vmlinuz-5.11.22-7-pve root=/dev/mapper/pve-root ro quiet iommu=pt intel_iommu=on
[    0.352588] iommu: Default domain type: Passthrough (set via kernel command line)
[    1.373583] pci 0000:00:00.0: Adding to iommu group 0
[    1.373592] pci 0000:00:02.0: Adding to iommu group 1
[    1.373605] pci 0000:00:14.0: Adding to iommu group 2
[    1.373613] pci 0000:00:17.0: Adding to iommu group 3
[    1.373623] pci 0000:00:1c.0: Adding to iommu group 4
[    1.373637] pci 0000:00:1d.0: Adding to iommu group 5
[    1.373647] pci 0000:00:1d.2: Adding to iommu group 6
[    1.373656] pci 0000:00:1d.3: Adding to iommu group 7
[    1.373675] pci 0000:00:1f.0: Adding to iommu group 8
[    1.373683] pci 0000:00:1f.2: Adding to iommu group 8
[    1.373691] pci 0000:00:1f.3: Adding to iommu group 8
[    1.373699] pci 0000:00:1f.4: Adding to iommu group 8
[    1.373707] pci 0000:00:1f.6: Adding to iommu group 9
[    1.373717] pci 0000:01:00.0: Adding to iommu group 10
[    1.373726] pci 0000:03:00.0: Adding to iommu group 11
[    1.373735] pci 0000:05:00.0: Adding to iommu group 12
[    1.656483]     intel_iommu=on

注意,上面日志中必须出现iommu group相关内容,否则是iommu开启失败,具体原因很可能是机器的bios中没有开启或者没有完全开启VT-d,请将VT-d及相关bios选项设置为enabled状态,重启机器后再执行上述dmesg命令查看日志是否正常。

安装驱动

# 将驱动下载至/opt目录
wget https://foxi.buduanwang.vip/pan/foxi/Virtualization/vGPU/NVIDIA-Linux-x86_64-460.73.01-grid-vgpu-kvm-v5-5.15.run -P /opt

# 给驱动添加可执行权限
chmod +x /opt/NVIDIA-Linux-x86_64-460.73.01-grid-vgpu-kvm-v5-5.15.run

# 安装
sh -c /opt/NVIDIA-Linux-x86_64-460.73.01-grid-vgpu-kvm-v5-5.15.run 

安装过程可以参考上文中所指出的引用文章

配置 vgpu_unlock

vgpu_unlock 是为消费级显卡虚拟化功能解锁,nvidia消费级显卡默认是不能开启vgpu虚拟化的,想要专业的支持虚拟化的显卡,需要购买 nvidia Tesla 等型号的显卡。我们这里是 GTX1060 所以需要使用 vgpu_unlock 进行 vpgu 解锁。

# 下载vgpu_unlock-rs版本
cd /opt/ && git clone https://github.com/mbilker/vgpu_unlock-rs.git 

# 编译
cd /opt/vgpu_unlock-rs && git checkout v2.0.1 && cargo build --release

# 安装 vgpu_unlock
cp /opt/vgpu_unlock-rs/target/release/libvgpu_unlock_rs.so /lib/nvidia/libvgpu_unlock_rs.so

重启宿主机一次

验证显卡驱动及 vgpu_unlock 是否成功。重启之后,使用nvidia-smi 确认是否如下,显示GPU信息。

➜  ~ nvidia-smi
Tue Nov 29 09:31:35 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.73.01    Driver Version: 460.73.01    CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  GeForce GTX 106...  On   | 00000000:03:00.0 Off |                  N/A |
| 10%   53C    P8     9W / 120W |   4084MiB /  6143MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

使用mdevctl types 验证是否出现mdev设备

➜  ~ mdevctl types
0000:03:00.0
  nvidia-156
    Available instances: 0
    Device API: vfio-pci
    Name: GRID P40-2B
    Description: num_heads=4, frl_config=45, framebuffer=2048M, max_resolution=5120x2880, max_instance=12
  nvidia-215
    Available instances: 0
    Device API: vfio-pci
    Name: GRID P40-2B4
    Description: num_heads=4, frl_config=45, framebuffer=2048M, max_resolution=5120x2880, max_instance=12
  nvidia-241
    Available instances: 0
    Device API: vfio-pci
    Name: GRID P40-1B4
    Description: num_heads=4, frl_config=45, framebuffer=1024M, max_resolution=5120x2880, max_instance=24
  nvidia-283
    Available instances: 0
    Device API: vfio-pci
    Name: GRID P40-4C
    Description: num_heads=1, frl_config=60, framebuffer=4096M, max_resolution=4096x2160, max_instance=6
  nvidia-284
    Available instances: 0
    Device API: vfio-pci
    Name: GRID P40-6C
    Description: num_heads=1, frl_config=60, framebuffer=6144M, max_resolution=4096x2160, max_instance=4
  nvidia-285
    Available instances: 0
    Device API: vfio-pci
    Name: GRID P40-8C
    Description: num_heads=1, frl_config=60, framebuffer=8192M, max_resolution=4096x2160, max_instance=3
  nvidia-286
    Available instances: 0
    Device API: vfio-pci
    Name: GRID P40-12C
    Description: num_heads=1, frl_config=60, framebuffer=12288M, max_resolution=4096x2160, max_instance=2
  nvidia-287
    Available instances: 0
    Device API: vfio-pci
    Name: GRID P40-24C
    Description: num_heads=1, frl_config=60, framebuffer=24576M, max_resolution=4096x2160, max_instance=1
  nvidia-46
    Available instances: 0
    Device API: vfio-pci
    Name: GRID P40-1Q
    Description: num_heads=4, frl_config=60, framebuffer=1024M, max_resolution=5120x2880, max_instance=24
  nvidia-47
    Available instances: 0
    Device API: vfio-pci
    Name: GRID P40-2Q
    Description: num_heads=4, frl_config=60, framebuffer=2048M, max_resolution=7680x4320, max_instance=12
  nvidia-48
    Available instances: 0
    Device API: vfio-pci
    Name: GRID P40-3Q
    Description: num_heads=4, frl_config=60, framebuffer=3072M, max_resolution=7680x4320, max_instance=8
  nvidia-49
    Available instances: 0
    Device API: vfio-pci
    Name: GRID P40-4Q
    Description: num_heads=4, frl_config=60, framebuffer=4096M, max_resolution=7680x4320, max_instance=6
  nvidia-50
    Available instances: 0
    Device API: vfio-pci
    Name: GRID P40-6Q
    Description: num_heads=4, frl_config=60, framebuffer=6144M, max_resolution=7680x4320, max_instance=4
  nvidia-51
    Available instances: 0
    Device API: vfio-pci
    Name: GRID P40-8Q
    Description: num_heads=4, frl_config=60, framebuffer=8192M, max_resolution=7680x4320, max_instance=3
  nvidia-52
    Available instances: 0
    Device API: vfio-pci
    Name: GRID P40-12Q
    Description: num_heads=4, frl_config=60, framebuffer=12288M, max_resolution=7680x4320, max_instance=2
  nvidia-53
    Available instances: 0
    Device API: vfio-pci
    Name: GRID P40-24Q
    Description: num_heads=4, frl_config=60, framebuffer=24576M, max_resolution=7680x4320, max_instance=1
  nvidia-54
    Available instances: 0
    Device API: vfio-pci
    Name: GRID P40-1A
    Description: num_heads=1, frl_config=60, framebuffer=1024M, max_resolution=1280x1024, max_instance=24
  nvidia-55
    Available instances: 10
    Device API: vfio-pci
    Name: GRID P40-2A
    Description: num_heads=1, frl_config=60, framebuffer=2048M, max_resolution=1280x1024, max_instance=12
  nvidia-56
    Available instances: 0
    Device API: vfio-pci
    Name: GRID P40-3A
    Description: num_heads=1, frl_config=60, framebuffer=3072M, max_resolution=1280x1024, max_instance=8
  nvidia-57
    Available instances: 0
    Device API: vfio-pci
    Name: GRID P40-4A
    Description: num_heads=1, frl_config=60, framebuffer=4096M, max_resolution=1280x1024, max_instance=6
  nvidia-58
    Available instances: 0
    Device API: vfio-pci
    Name: GRID P40-6A
    Description: num_heads=1, frl_config=60, framebuffer=6144M, max_resolution=1280x1024, max_instance=4
  nvidia-59
    Available instances: 0
    Device API: vfio-pci
    Name: GRID P40-8A
    Description: num_heads=1, frl_config=60, framebuffer=8192M, max_resolution=1280x1024, max_instance=3
  nvidia-60
    Available instances: 0
    Device API: vfio-pci
    Name: GRID P40-12A
    Description: num_heads=1, frl_config=60, framebuffer=12288M, max_resolution=1280x1024, max_instance=2
  nvidia-61
    Available instances: 0
    Device API: vfio-pci
    Name: GRID P40-24A
    Description: num_heads=1, frl_config=60, framebuffer=24576M, max_resolution=1280x1024, max_instance=1
  nvidia-62
    Available instances: 0
    Device API: vfio-pci
    Name: GRID P40-1B
    Description: num_heads=4, frl_config=45, framebuffer=1024M, max_resolution=5120x2880, max_instance=24

如果上面两个验证不正常,请检查 nvidia-vgpud 服务和 nvidia-vgpu-mgr 服务的日志输出,命令如下

journalctl -u  nvidia-vgpud
journalctl -u  nvidia-vgpu-mgr

请根据日志错误自行谷歌,然后尝试修复。

配置 vgpu 参数,即前面安装的 vgpu_unlock-rs 的配置文件,路径 /etc/vgpu_unlock/profile_override.toml。内容如下

  ~ cat /etc/vgpu_unlock/profile_override.toml
[profile.nvidia-55]
num_displays = 1
display_width = 1920
display_height = 1080
max_pixels = 2073600
cuda_enabled = 1
frl_enabled = 0

注意:framebufferpci_idpci_device_id 这三个选项不要配置,因为这三个参数的值你拿不准,会导致后续创建虚拟机虚拟显卡后,虚拟机无法启动,常见报错如下 注意:我这里选择 nvidia-55 这个虚拟显卡,它显存是2G,我6G显卡可以虚拟出3个这个型号的显卡,你也可以选择其他显卡,具体显卡对应参数,参见上面mdevctl types的输出内容。

Input/output error Verify all devices in group 29 are bound to vfio-<bus> or pci-stub and not already in use

具体配置可以参考 vgpu_unlock-rs 项目主页

接下来就可以在pve-web上创建虚拟显卡了,首先设备选择 GTX 1060

0of8azqz

然后 MDev 选择前面 vgpu_unlock/profile_override.toml 里配置的 nvidia-55

eshvg6wq

显卡创建完毕,虚拟机开机,windows系统中设备管理器可以看到未知的显示设备,安装驱动,从这里下载,选择grid_win10原生驱动即可,也可以去NVIDIA官网下载grid驱动,但是据说虚拟机里的驱动版本不能比宿主机驱动版本高,具体我没验证。安装完毕如下

2wc85256

接下来我们在我们的 ubuntu-20.04 虚拟机上也添加一个显卡,添加步骤相同,看结果

root@ubuntu-gpu:~# lspci
00:00.0 Host bridge: Intel Corporation 440FX - 82441FX PMC [Natoma] (rev 02)
00:01.0 ISA bridge: Intel Corporation 82371SB PIIX3 ISA [Natoma/Triton II]
00:01.1 IDE interface: Intel Corporation 82371SB PIIX3 IDE [Natoma/Triton II]
00:01.2 USB controller: Intel Corporation 82371SB PIIX3 USB [Natoma/Triton II] (rev 01)
00:01.3 Bridge: Intel Corporation 82371AB/EB/MB PIIX4 ACPI (rev 03)
00:02.0 VGA compatible controller: Device 1234:1111 (rev 02)
00:03.0 Unclassified device [00ff]: Red Hat, Inc. Virtio memory balloon
00:05.0 PCI bridge: Red Hat, Inc. QEMU PCI-PCI bridge
00:10.0 VGA compatible controller: NVIDIA Corporation GP102GL [Tesla P40] (rev a1)
00:12.0 Ethernet controller: Red Hat, Inc. Virtio network device
00:1e.0 PCI bridge: Red Hat, Inc. QEMU PCI-PCI bridge
00:1f.0 PCI bridge: Red Hat, Inc. QEMU PCI-PCI bridge
01:01.0 SCSI storage controller: Red Hat, Inc. Virtio SCSI

可以看到我们的NVIDIA显卡,表示虚拟显卡添加正常,驱动可以自己去官网下载安装Grid驱动。

可惜的是,显卡虚拟化成功了,但是需要买 nvidia 的 License 才能在虚拟机中用。老黄的刀法,不得不服啊……😅


初学rust,数组vector的自增
Tag rust, vector, 自增, on by view 76

在golang中经常会踩一个坑,那就是slice append,golang的动态数组也称为slice,使用append可以对动态数组进行添加元素,但是slice空间不够之后golang会自动重新分配内存空间,每次重新分配的内存空间是原空间的2倍,而且有个更坑的是,golang中slice每次重新分配内存都是重新分配一片 2N 大小的内存,然后把原来的数据拷贝过去,这样一来,性能损耗更大了。

那么rust中的数组是怎么处理动态增长的呢,我们来一段代码测试一下。

fn main() {
    let mut vec = Vec::with_capacity(100);

    // The vector contains no items, even though it has capacity for more
    println!("vec.len: {:?}", vec.len());
    println!("vec.cap: {:?}", vec.capacity());

    // These are all done without reallocating...
    for i in 0..100 {
        vec.push(i);
    }
    println!("vec.len: {:?}", vec.len());
    println!("vec.cap: {:?}", vec.capacity());

    // ...but this may make the vector reallocate
    vec.push(101);
    println!("vec.len: {:?}", vec.len());
    println!("vec.cap: {:?}", vec.capacity());
}

输出如下

vec.len: 0
vec.cap: 100
vec.len: 100
vec.cap: 100
vec.len: 101
vec.cap: 200

可以看到,rust中,当vector数组存储满了之后,再往里面添加元素,vector就会重新分配内存,新分配的内存也是原来空间的2倍,但是,他是在原来的内存上扩充的,而不是像golang一样重新分配一片2N的内存空间替换旧的内存。性能损耗上,相比golang少了copy数据和释放旧空间。所以在高性能场景下,这里依然不建议使用vector的自动增长特性,自动增长的内存分配会消耗存储空间,而且 2N 的增长步长会很容易导致内存泄漏,你如果依赖这个自动增长特性,你将会发现你使用的内存可能会发生 2^n 指数级增长。这绝对会在大部分边界条件下导致你的程序迅速的发生OOM。

远离动态数组,远离bug。


nginx upstream DNS解析问题
Tag nginx, upstream, dns, on by view 131

最近发现我香港服务器上放置的几个 web 站点经常会偶尔出现无法访问的情况,这个香港服务器上放置的是 trojan 加 nginx,流量从trojan进入,部分转发出去,另外部分是web站点的流量,转发到nginx,从而实现流量代理和web访问。

这个香港节点出现web访问异常,之前也遇到过几次,都是重启nginx就正常了。这次决定仔细看下是什么情况,登陆节点,首先查看trojan日志,发现在正常转发,再看下nginx的日志,如下

2022/11/14 01:27:11 [error] 22#22: *473474 upstream timed out (110: Connection timed out) while connecting to upstream, client: 61.177.173.46, server: 0.0.0.0:22, upstream: "36.36.106.166:23000", bytes from/to client:0/0, bytes from/to upstream:0/0
2022/11/14 01:27:27 [error] 22#22: *473486 upstream timed out (110: Connection timed out) while connecting to upstream, client: 203.205.141.115, server: 0.0.0.0:22, upstream: "36.36.106.166:23000", bytes from/to client:0/0, bytes from/to upstream:0/0
2022/11/14 01:28:36 [error] 22#22: *473490 upstream timed out (110: Connection timed out) while connecting to upstream, client: 203.205.141.115, server: 0.0.0.0:22, upstream: "36.36.106.166:23000", bytes from/to client:0/0, bytes from/to upstream:0/0
2022/11/14 01:29:19 [error] 22#22: *473492 upstream timed out (110: Connection timed out) while connecting to upstream, client: 61.177.173.52, server: 0.0.0.0:22, upstream: "36.36.106.166:23000", bytes from/to client:0/0, bytes from/to upstream:0/0

发现nginx日志显示有连接超时,于是我决定判断一下是否真的连接不上,telnet

➜  trojan git:(master) telnet 36.36.106.166 23000
Trying 36.36.106.166...
^C

果然连接不上,我web站点配置如下

server {
    listen 10110 ssl http2;
    server_name xxx.duguying.net;

    root /usr/share/nginx/html;
    index index.php index.html;
    ssl_certificate /data/certs/_.duguying.net.crt; 
    ssl_certificate_key /data/certs/_.duguying.net.key;
    ssl_stapling on;
    ssl_stapling_verify on;
    add_header Strict-Transport-Security "max-age=31536000";

    location / {
        include git.deny;
        proxy_set_header X-Forwarded-For $remote_addr;
        proxy_set_header Host            $http_host;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection "upgrade";
        proxy_pass http://jxx.duguying.net:****;
        proxy_buffering    off;
        proxy_buffer_size  128k;
        proxy_buffers 100  128k;
    }
}

web站点的流量经香港节点转发到 jxx.duguying.net ,dig一下

➜  ~ dig jxx.duguying.net

; <<>> DiG 9.11.5-P4-5.1+deb10u7-Debian <<>> jxx.duguying.net
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 25787
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;jxx.duguying.net.             IN      A

;; ANSWER SECTION:
jxx.duguying.net.      60      IN      A       222.248.21.219

;; Query time: 22 msec
;; SERVER: 192.168.1.1#53(192.168.1.1)
;; WHEN: 一 11月 14 11:17:24 CST 2022
;; MSG SIZE  rcvd: 51

发现nginx转发的节点居然不是目前upstream域名解析的节点,这说明我upstream域名dns更新了,但是nginx上upstream域名解析没更新。网上查询之后才知道,域名作为upstream,它的解析节点并不会时事更新。解决方案

方案一:每次dns有变化,重启Nginx (最开始出现故障重启nginx恢复就是这种解决方案)
方案二:使用Nginx Resolver
方案三:使用 Nginx-upstream-dynamic-server (nginx模块)
方案四:使用 ngx_upstream_jdomain (nginx模块)

这里介绍一下方案二,添加resolver相关配置,只需要将nginx配置改为如下

server {
    listen 10110 ssl http2;
    server_name xxx.duguying.net;

    resolver 127.0.0.1 valid=60s;    // 这里设置dns服务器
    resolver_timeout 3s;             // 这里设置dns解析超时时间

    root /usr/share/nginx/html;
    index index.php index.html;
    ssl_certificate /data/certs/_.duguying.net.crt; 
    ssl_certificate_key /data/certs/_.duguying.net.key;
    ssl_stapling on;
    ssl_stapling_verify on;
    add_header Strict-Transport-Security "max-age=31536000";

    location / {
        include git.deny;
        proxy_set_header X-Forwarded-For $remote_addr;
        proxy_set_header Host            $http_host;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection "upgrade";
        proxy_pass http://jxx.duguying.net:****;
        proxy_buffering    off;
        proxy_buffer_size  128k;
        proxy_buffers 100  128k;
    }
}

初学rust,优雅的解包Option
Tag rust, option, on by view 436

rust中,Option表示一个可能不存在值的复合类型。其定义如下

pub enum Option<T> {
    None,
    Some(T),
}

可以看到,它里面的值要么就是类型T的值,要么就是None(也表示不存在值)。通常,我们在获取到可能为空的值时,Option类型很有用,它要求你必须去处理可能为None的情况。它有方法 is_none, is_some 等,可以判定是否为空。但是如果你使用这两个方法来解包 Option,免不了if else判断,代码会比较难看。比如

let x: Option<u32> = Some(2);
let mut value: u32 = 0;
if x.is_none() {
    value = default;
} else {
    value = x.unwrap(); // unwrap 方法可以解一切包,但是遇到 None 会 panic
}

可以看到上面的代码用了2条语句,首先是初始化value值,然后判定是否为None,根据不同的情况,对value重新赋值,明显复杂,我解个Option还得分两步,而且申明的value还必需是mut类型。那么,对于“我需要解包Option,如果Option为None则给默认值”这个需求,有更优雅的写法吗?有,如下

let x: Option<u32> = Some(2);

// 方法一:利用 match,和语句块
let value = match x {
    Some(val) => val,
    None => default,
};

// 方法二:利用 if-let,和语句块
let value = {
    if let Some(val) = x {
        *val
    } else {
        default
    }
};

上述两种方法都是借助语句块一步到位的解包Option,并且没有调用任何方法。


给iPhone XS更换电池
Tag iphone, 电池, on by view 663

最近手机电池不耐用了,而且经常发生剩余电量20%就自动关机,充电到80%就充不进的情况。于是决定换一下电池。

关于给iPhone换电池我之前也干过,之前给iPhone SE换电池,按照网上的教程拉电池胶,结果一再拉断,最后只能掰电池,撬电池,弄下来之后电池已经不成样子了。后面给macbook也换过电池,macbook电池是因为使用时间太久导致电池鼓包,电池损耗到太高提醒电池维修,拆卸过程中遇到的同样是电池胶的问题,不得不暴力撕下电池。后来听说电子洗涤剂很容易除电池胶,正好之前玩flash芯片买了一瓶电子洗涤剂,试了一下,真香。电子洗涤剂从缝里一喷,原本难以撕开的电池,自动的轻松分离开了。

3qnjp5vj

备注:这个手机是二换电池了,5月份买了块品胜电池,用了5个月就不行了,看样子品胜电池次品率还挺高。这次换个德赛电池试一下。

对于换电池的小伙伴,这里强烈推荐电子洗涤剂来除胶,根本用不着像教学视频里一点点的卷胶,洗涤剂会让你轻而易举的就能拆卸下电池。还有一点需要注意,据说电子洗涤剂有毒,请妥善保存。


重拾单反,体验微距拍摄
Tag 单反, 微距, on by view 278

早年买了Nikon D7200,放了很久没怎么用,上次去中山带上了单反,感觉拍照效果很好。这一次学会了手动对焦。于是买了个近摄镜,想体验一下微距摄影,拍一下花花草草和昆虫蚂蚁的细节。

zz4mo02x

6rh4rexs

w5u4lg1m

e2sezosf

上面这些照片都是配合近摄镜手动对焦拍摄的。


初学rust,wasm前端图片转码
Tag wasm, rust, 转码, on by view 401

最近用rust写的日志上报agent趋近完善,意味着一个练习rust的小项目结束了。于是便找了个新的小项目,用rust代码编译出wasm,在浏览器端实现图片缩放、转码。决定做前端转码是出于两方面原因,第一是想体验一下rust-webassembly,第二是博客的管理后台上传图片能力有待优化,无法直接上传单反拍出来的图片,因为单反照都是十几兆以上大小,我的云服务器只有1M带宽,上传超时,就算我能忍受超时,也无法忍受大文件后端转码压缩时io满负载直接卡死服务器的情况。于是便有了这次wasm体验。

首先,如果你已经入门了rust,能用rust写代码了,那么用rust实现wasm将会是一种非常好的体验。因为rust的wasm全套工具齐全,你可以直接在rust项目中编译出npm包,编译出来的结果可以直接上传到npm仓库。这里简单介绍一下基于rust的wasm包开发过程。

首先创建rust的包项目,注意不是可执行文件。

cargo new wtools --lib

然后,修改Cargo.toml文件,定义包类型

[package]
name = "wtools"
version = "0.1.6"
edition = "2021"
description = "wasm tools"
license = "MIT"

# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html
[lib]
crate-type = ["cdylib", "rlib"] # cdylib 是wasm库, rlib 是常规rust库

[profile.release]
lto = true
opt-level = 'z'

[dependencies]

注意lib下的crate-type字段要定义为cdylib,只有这种包才能编译为wasm,然后还有一个选项需要注意profile.release下的lto=trueopt-level = 'z'这两个选项设置后,可以在编译的时候讲wasm压缩到最小大小,以减小wasm文件在网络中分发的大小。当然,缩减wasm还有个工具,叫做wasm-opt,但是我具体实测之后发现,只要设置了上面的ltoopt-level选项,这个工具能缩减的大小非常有限,有时候甚至无法再进一步缩减了。

安装工具。这里编译wasm报并不是用原生的cargo,而是使用一个叫做wasm-pack的工具,它的优点是,可以直接编译出npm包。安装

cargo install wasm-pack

编译

wasm-pack build --scope duguying

上传npm包

cd pkg
npm publish --access=public

整个开发的过程就是如上的那些。下面简单介绍一下代码。首先,我们这个rust项目的目标是编译为wasm在浏览器上运行。这里就免不了js与rust之间进行数据传递,以及rust里操作浏览器中的各种对象和元素。介绍两个rust包,第一个js-sys,用于js与rust之间进行数据传递的,在这个包里能找到js中的数据类型对应的类型;第二个web-sys,用于浏览器对象与rust之间进行数据传递的,在这个包里有对应浏览器中的各种对象。

比如,最常见的浏览器日志打印console.log,在web-sys中能找到console对象,详情可以查看文档。在我的rust包中简单的包装了一下

extern crate wasm_bindgen;
extern crate web_sys;

use wasm_bindgen::prelude::*;

#[macro_export]
macro_rules! console_log {
    ($($t:tt)*) => (web_sys::console::log(&js_sys::Array::of1(&JsValue::from(
        format_args!($($t)*).to_string()
    ))))
}

#[wasm_bindgen]
pub fn greet(name: &str) {
    console_log!("Hello, {}!", name);
}

这样就可以在其他地方用console_log来调用了,比如

console_log!("load img failed, err: {:?}", error);

我需要进行图片处理,所以用到了image这个包,这个包支持缩放图片resize、旋转图片rotate以及翻转图片flipv等。我主要用到缩放和旋转。另外有一点需要注意的是,需要导出到js的结构体和方法函数等,需要添加#[wasm_bindgen]注解。这个注解是在wasm_bindgen这个包中定义的,这个也是rust编译为wasm的核心包,具体可以查看文档。因为我发现单反上拍摄的照片通常会根据拍照者持相机的角度有一个旋转参数,而这个参数,它是存到了照片的exif信息中,但是他的照片数据实际存储是按照相机原始的方向存储的,所以,竖着拍摄的照片在上传到服务器之后会发现照片是横着的,需要旋转90度。所以在这里我还用到了kamadak-exif这个包,来读取照片的exif信息,从而获取旋转参数,然后根据旋转参数调用rotate对照片进行旋转来修正照片方向。图片处理的代码如下

extern crate wasm_bindgen;

use exif::{Error, Exif, In, Tag};
use image::{imageops::FilterType, DynamicImage, EncodableLayout, ImageFormat};
use js_sys::Uint8Array;
use std::io::{Cursor, Read, Seek, SeekFrom};
use wasm_bindgen::prelude::*;

use crate::console_log;

#[wasm_bindgen]
pub struct Img {
    img: DynamicImage,
    img_format: ImageFormat,
    exif: Result<Exif, Error>,
    orientation: u32,
}

#[wasm_bindgen]
impl Img {
    #[wasm_bindgen(constructor)]
    pub fn new(img: &[u8], mime: &str) -> Img {
        let exifreader = exif::Reader::new();
        let (img_data, img_format) = Img::load_image_from_array(img, mime.to_string());
        let mut c = Cursor::new(Vec::from(img));
        let exif = exifreader.read_from_container(&mut c);

        let mut image = Img {
            img: img_data,
            img_format: img_format,
            exif: exif,
            orientation: 0,
        };
        image.get_orietation();
        image.fix_orietation();
        image
    }

    fn load_image_from_array(_array: &[u8], mime: String) -> (DynamicImage, ImageFormat) {
        let img_format = ImageFormat::from_mime_type(mime).unwrap();
        let img = match image::load_from_memory_with_format(_array, img_format) {
            Ok(img) => img,
            Err(error) => {
                console_log!("load img failed, err: {:?}", error);
                panic!("{:?}", error)
            }
        };
        return (img, img_format);
    }

    fn get_orietation(&mut self) {
        match &self.exif {
            Ok(exif) => {
                let r = exif.get_field(Tag::Orientation, In::PRIMARY);
                match r {
                    Some(oriet) => {
                        self.orientation = oriet.value.get_uint(0).unwrap();
                    }
                    None => {}
                }
                console_log!("orientation: {:?}", r.unwrap());
            }
            Err(_error) => {}
        };
    }

    fn fix_orietation(&mut self) {
        match self.orientation {
            8 => self.img = self.img.rotate270(),
            3 => self.img = self.img.rotate180(),
            6 => self.img = self.img.rotate90(),
            _ => {}
        }
    }

    fn image_to_uint8_array(&self, img: DynamicImage) -> Uint8Array {
        // 创建一个内存空间
        let mut c = Cursor::new(Vec::new());
        match img.write_to(&mut c, self.img_format) {
            Ok(c) => c,
            Err(error) => {
                panic!(
                    "There was a problem writing the resulting buffer: {:?}",
                    error
                )
            }
        };
        c.seek(SeekFrom::Start(0)).unwrap();
        let mut out = Vec::new();
        // 从内存读取数据
        c.read_to_end(&mut out).unwrap();
        let v = out.as_bytes();
        Uint8Array::from(v)
    }

    pub fn get_width(&self) -> u32 {
        return self.img.width();
    }

    pub fn get_height(&self) -> u32 {
        return self.img.height();
    }

    pub fn grayscale(&self) -> Uint8Array {
        let img = self.img.grayscale();
        self.image_to_uint8_array(img)
    }

    pub fn scale(&self, width: u32, height: u32) -> Uint8Array {
        let img = self.img.resize(width, height, FilterType::Triangle);
        self.image_to_uint8_array(img)
    }

    pub fn rotate90(&self) -> Uint8Array {
        let img = self.img.rotate90();
        self.image_to_uint8_array(img)
    }

    pub fn rotate180(&self) -> Uint8Array {
        let img = self.img.rotate180();
        self.image_to_uint8_array(img)
    }

    pub fn rotate270(&self) -> Uint8Array {
        let img = self.img.rotate270();
        self.image_to_uint8_array(img)
    }

    pub fn flipv(&self) -> Uint8Array {
        let img = self.img.flipv();
        self.image_to_uint8_array(img)
    }

    pub fn fliph(&self) -> Uint8Array {
        let img = self.img.fliph();
        self.image_to_uint8_array(img)
    }
}

编译成功打包上传npm仓库之后,在前端项目中使用有一点需要注意,像这种基于wasm的npm包并不能像常规的npm包那样直接import引入,而是需要异步引入,这种写法非常不优雅,如下

/**
 * @description 全局注册md5工具
 */
async function waitwasm () {
  const { Crypt, Img } = await import('@duguying/wtools')
  Vue.prototype.$md5 = (content) => {
    let crypt = new Crypt()
    let out = crypt.md5(content)
    crypt.free()
    return out
  }
  Vue.prototype.$scale_img = (file) => {
    return new Promise(function (resolve, reject) {
      let reader = new FileReader()
      reader.readAsArrayBuffer(file)
      reader.onload = function () {
        let data = new Uint8Array(this.result)
        console.log('data:', data)
        let kit = new Img(data, file.type)
        console.log(kit)
        let w = kit.get_width()
        let h = kit.get_width()
        console.log('wh:', w, h)
        if (w > 2000) {
          w = 2000
          h = h / w * 2000
        } else {
          resolve(file)
          return
        }
        let out = kit.scale(w, h)
        resolve(new Blob([out.buffer], { type: file.type }))
      }
    })
  }
}
(async () => {
  waitwasm()
})()

他本身是一个异步引入,但是需要等它引入完毕之后,才能调用其中的方法,否则就会报错,所以,这里只好同步阻塞,等他引入完毕了。


动物园之旅
Tag 动物园, on by view 276

5号之前在深圳,主要逛了深圳动物园

ovmp23xu 动物园粽子

a9tc848z 动物园火烈鸟

5,6号在中山

v549tnop 孙中山老家

pn40sa8t 中山影视城

0tmilgih 云梯山

然后呢,回来之后,第二天开始,喜提三次黄码 😂


初学rust,给线程绑核
Tag 绑核, 线程, on by view 616

rust中,可以通过core_affinity这个crate对线程进行核绑定。但是绑核过程中发现一个问题。针对主线程绑核,若是主线程绑核后,创建子线程,在该子线程种尝试绑核会发现只有一个核可以。所以,使用这个库如果需要对主线程进行绑核,需要在所有子线程创建完毕之后进行。

绑核的函数如下

static CORE_IDS: Lazy<Vec<CoreId>> =
    Lazy::new(|| core_affinity::get_core_ids().unwrap()); // 尝试过给CORE_IDS 加锁,也是一样

fn bind_core(id: usize) {
    let mut selected_id = CORE_IDS[0];
    for core_id in CORE_IDS.clone() {
        println!("core {:?}, bind to: {:?}", selected_id, id);
        if core_id.id == id {
            selected_id = core_id;
            break;
        }
    }

    core_affinity::set_for_current(selected_id);
}

在主线程中绑核,且主线程的绑核在创建子线程之前,发现,第一次遍历核,核有8个,成功绑定到指定的7号核;第二次子线程中绑核,遍历核,核有1个,为7号核,所以只能绑定到7号核,这样就与主线程同核了。

core_id num: 8, cores: [CoreId { id: 0 }, CoreId { id: 1 }, CoreId { id: 2 }, CoreId { id: 3 }, CoreId { id: 4 }, CoreId { id: 5 }, CoreId { id: 6 }, CoreId { id: 7 }]

print core_id: CoreId { id: 0 }, bind id: 7
print core_id: CoreId { id: 1 }, bind id: 7
print core_id: CoreId { id: 2 }, bind id: 7
print core_id: CoreId { id: 3 }, bind id: 7
print core_id: CoreId { id: 4 }, bind id: 7
print core_id: CoreId { id: 5 }, bind id: 7
print core_id: CoreId { id: 6 }, bind id: 7
print core_id: CoreId { id: 7 }, bind id: 7
core_id num: 1, cores: [CoreId { id: 7 }]

print core_id: CoreId { id: 7 }, bind id: 6

改为,主线程绑核在创建子线程之后,如下

fn main() {
    let cfg = config::cfg::get_config();
    let filename = cfg.las.as_ref().unwrap().access_log.as_ref().unwrap();
    let mut watcher = LogWatcher::register(filename.to_string()).unwrap();
    let poly: Poly = Poly::new(); // 此处调用会创建子线程
    config::affinity::bind_core_follow_config(0); // 绑核

    watcher.watch(&mut |line: String| {
        poly.clone().push(line);
        LogWatcherAction::None
    })
}

发现,成功的选中指定的主线程7号核,子线程6号核,输出如下

core_id num: 8, cores: [CoreId { id: 0 }, CoreId { id: 1 }, CoreId { id: 2 }, CoreId { id: 3 }, CoreId { id: 4 }, CoreId { id: 5 }, CoreId { id: 6 }, CoreId { id: 7 }]

print core_id: CoreId { id: 0 }, bind id: 7
print core_id: CoreId { id: 1 }, bind id: 7
print core_id: CoreId { id: 2 }, bind id: 7
print core_id: CoreId { id: 3 }, bind id: 7
print core_id: CoreId { id: 4 }, bind id: 7
print core_id: CoreId { id: 5 }, bind id: 7
print core_id: CoreId { id: 6 }, bind id: 7
print core_id: CoreId { id: 7 }, bind id: 7
core_id num: 8, cores: [CoreId { id: 0 }, CoreId { id: 1 }, CoreId { id: 2 }, CoreId { id: 3 }, CoreId { id: 4 }, CoreId { id: 5 }, CoreId { id: 6 }, CoreId { id: 7 }]

print core_id: CoreId { id: 0 }, bind id: 6
print core_id: CoreId { id: 1 }, bind id: 6
print core_id: CoreId { id: 2 }, bind id: 6
print core_id: CoreId { id: 3 }, bind id: 6
print core_id: CoreId { id: 4 }, bind id: 6
print core_id: CoreId { id: 5 }, bind id: 6
print core_id: CoreId { id: 6 }, bind id: 6

其中两次选核过程中,也都能够正常打印出所有核。

那么我试一下主线程中先创建A线程,在A线程中绑核(7),然后在主线程中绑核(6),最后创建worker线程,在worker线程中绑核(5),代码如下

fn main() {
    let cfg = config::cfg::get_config();
    let filename = cfg.las.as_ref().unwrap().access_log.as_ref().unwrap();
    let mut watcher = LogWatcher::register(filename.to_string()).unwrap();

    thread::Builder::new()
        .name("A".into())
        .spawn(|| {
            config::affinity::bind_core_follow_config(2);
            loop {
                sleep(Duration::from_secs(1));
            }
        })
        .unwrap();

    config::affinity::bind_core_follow_config(0);
    sleep(Duration::from_secs(2));

    let poly: Poly = Poly::new(); // worker

    watcher.watch(&mut |line: String| {
        poly.clone().push(line);
        LogWatcherAction::None
    })
}

结果如下

core CoreId { id: 0 }, bind to: 7
core CoreId { id: 1 }, bind to: 7
core CoreId { id: 2 }, bind to: 7
core CoreId { id: 3 }, bind to: 7
core CoreId { id: 4 }, bind to: 7
core CoreId { id: 5 }, bind to: 7
core CoreId { id: 6 }, bind to: 7
core CoreId { id: 7 }, bind to: 7
>>> core CoreId { id: 7 }, bind to: Thread { id: ThreadId(1), name: Some("main"), .. }

core CoreId { id: 0 }, bind to: 5
core CoreId { id: 1 }, bind to: 5
core CoreId { id: 2 }, bind to: 5
core CoreId { id: 3 }, bind to: 5
core CoreId { id: 4 }, bind to: 5
core CoreId { id: 5 }, bind to: 5
>>> core CoreId { id: 5 }, bind to: Thread { id: ThreadId(2), name: Some("A"), .. }

core CoreId { id: 7 }, bind to: 6
>>> core CoreId { id: 7 }, bind to: Thread { id: ThreadId(3), name: Some("worker"), .. }

可以看到worker进程获取到的核数不正常。worker线程内部略微复杂,里面涉及到数据处理,tokio异步调用的上报等。可是我把worker线程换为简单的替换线程,却没有问题了。代码如下

fn main() {
    let cfg = config::cfg::get_config();
    let filename = cfg.las.as_ref().unwrap().access_log.as_ref().unwrap();
    let mut watcher = LogWatcher::register(filename.to_string()).unwrap();

    thread::Builder::new()
        .name("A".into())
        .spawn(|| {
            config::affinity::bind_core_follow_config(2);
            loop {
                sleep(Duration::from_secs(1));
            }
        })
        .unwrap();

    config::affinity::bind_core_follow_config(0);
    sleep(Duration::from_secs(2));

    // let poly: Poly = Poly::new(); // worker

    thread::Builder::new()
        .name("B worker".into())
        .spawn(|| {
            config::affinity::bind_core_follow_config(1);
            loop {
                sleep(Duration::from_secs(1));
            }
        })
        .unwrap();

    watcher.watch(&mut |line: String| {
        // poly.clone().push(line);
        LogWatcherAction::None
    })
}

其中B worker是替换worker的线程,结果如下

core CoreId { id: 0 }, bind to: 7
core CoreId { id: 1 }, bind to: 7
core CoreId { id: 2 }, bind to: 7
core CoreId { id: 3 }, bind to: 7
core CoreId { id: 4 }, bind to: 7
core CoreId { id: 5 }, bind to: 7
core CoreId { id: 6 }, bind to: 7
core CoreId { id: 7 }, bind to: 7
>>> core CoreId { id: 7 }, bind to: Thread { id: ThreadId(1), name: Some("main"), .. }

core CoreId { id: 0 }, bind to: 5
core CoreId { id: 1 }, bind to: 5
core CoreId { id: 2 }, bind to: 5
core CoreId { id: 3 }, bind to: 5
core CoreId { id: 4 }, bind to: 5
core CoreId { id: 5 }, bind to: 5
>>> core CoreId { id: 5 }, bind to: Thread { id: ThreadId(2), name: Some("A"), .. }

core CoreId { id: 0 }, bind to: 6
core CoreId { id: 1 }, bind to: 6
core CoreId { id: 2 }, bind to: 6
core CoreId { id: 3 }, bind to: 6
core CoreId { id: 4 }, bind to: 6
core CoreId { id: 5 }, bind to: 6
core CoreId { id: 6 }, bind to: 6
>>> core CoreId { id: 6 }, bind to: Thread { id: ThreadId(3), name: Some("B worker"), .. }

结论,core_affinity::get_core_ids获取到的核心信息具有不确定性。查阅作者仓库发现了一个issue与这个问题相关,提issue者同样反馈了这个问题,并质疑get_core_ids获取到的不是所有核,而是对于当前线程的可用核。


初学rust,tokio的async与await
Tag rust, tokio, async, await, on by view 539

最近监控上报的agent里需要将数据上报到es,所以用了elasticsearch-rs这个包,这是es官方提供的rust版本sdk,看了一下版本号,目前的版本都处于alpha。

下面用一个简单实例讲述我遇到的问题。首先是,调用sdk发现需要用async标注函数。

use elasticsearch::{http::transport::Transport, BulkParts, Elasticsearch};
use serde_json::Value;

fn main() {
    println!("Hello, world!");
    send_es("hi".to_string());
    println!("hi");
}

pub async fn send_es(body: String) {
    let transport = Transport::single_node("http://xxx.xxx.net").unwrap();
    let client = Elasticsearch::new(transport);
    let mut bodies: Vec<String> = Vec::with_capacity(1);
    bodies.push(body);
    let response = client
        .bulk(BulkParts::Index("nginx"))
        .body(bodies)
        .send()
        .await
        .unwrap();
    let response_body = response.json::<Value>().await.unwrap();
    println!("{:?}", response_body);
}

运行后,发现有一个警告,并且send_es没有被调用

➜  tes git:(master) ✗ cargo run
   Compiling tes v0.1.0 (/root/code/las/tes)
warning: unused implementer of `Future` that must be used
 --> src/main.rs:6:5
  |
6 |     send_es("hi".to_string());
  |     ^^^^^^^^^^^^^^^^^^^^^^^^^^
  |
  = note: `#[warn(unused_must_use)]` on by default
  = note: futures do nothing unless you `.await` or poll them

warning: `tes` (bin "tes") generated 1 warning
    Finished dev [unoptimized + debuginfo] target(s) in 3.16s
     Running `target/debug/tes`
Hello, world!
hi

警告说futures do nothing,也就是send_es将会什么也不做。后面,我找到了一个方法block_on可以执行async方法,于是,变成了这样

use elasticsearch::{http::transport::Transport, BulkParts, Elasticsearch};
use futures::executor::block_on;
use serde_json::Value;

fn main() {
    println!("Hello, world!");
    block_on(send_es("hi".to_string()));
    println!("hi");
}

pub async fn send_es(body: String) {
    let transport = Transport::single_node("http://xxx.xxx.net").unwrap();
    let client = Elasticsearch::new(transport);
    let mut bodies: Vec<String> = Vec::with_capacity(1);
    bodies.push(body);
    let response = client
        .bulk(BulkParts::Index("nginx"))
        .body(bodies)
        .send()
        .await
        .unwrap();
    let response_body = response.json::<Value>().await.unwrap();
    println!("{:?}", response_body);
}

但是执行后发现报错如下

➜  tes git:(master) ✗ cargo run
   Compiling tes v0.1.0 (/root/code/las/tes)
    Finished dev [unoptimized + debuginfo] target(s) in 3.52s
     Running `target/debug/tes`
Hello, world!
thread 'main' panicked at 'there is no reactor running, must be called from the context of a Tokio 1.x runtime', /root/.cargo/registry/src/github.com-1ecc6299db9ec823/hyper-0.14.20/src/client/connect/dns.rs:121:24
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

各种查找解决方案之后,也没能解决这个问题,说是tokio版本依赖的问题,两个不同的组件间接引用了不同版本的tokio,说是引入tokio = "*"就能解决依赖问题,但是实际上是无法解决的。所以我就用了上面的最小用例来调用elasticsearch sdk,只调用sdk,不引用任何其他依赖(原项目中还引用了reqwest包,这个包依赖了tokio)。发现这个最小用例也报错,说明根本不是依赖问题,但是可以确定问题出在tokio上。于是阅读了tokio官方文档,了解到运行async函数可以用#[tokio::main]标注,结合.await就可以了。于是重新修改后如下

use elasticsearch::{http::transport::Transport, BulkParts, Elasticsearch};
use serde_json::Value;

#[tokio::main]
async fn main() {
    println!("Hello, world!");
    send_es("hi".to_string()).await;
    println!("hi");
}

pub async fn send_es(body: String) {
    let transport = Transport::single_node("http://xxx.xxx.net").unwrap();
    let client = Elasticsearch::new(transport);
    let mut bodies: Vec<String> = Vec::with_capacity(1);
    bodies.push(body);
    let response = client
        .bulk(BulkParts::Index("nginx"))
        .body(bodies)
        .send()
        .await
        .unwrap();
    let response_body = response.json::<Value>().await.unwrap();
    println!("{:?}", response_body);
}

问题解决,终于调用成功了。

总结,tokio是rust中有名的异步调用的包。它定义了asyncawait这些关键词,而实现异步。同样他也定义了异步函数的调用方式,就是#[tokio::main]标注。