soxrok2212
03/28/2024, 11:46 PMsoxrok2212
03/28/2024, 11:46 PMCFSworks
03/29/2024, 12:13 AMsoxrok2212
03/29/2024, 12:16 AMsoxrok2212
03/29/2024, 12:17 AMCFSworks
03/29/2024, 12:19 AMsoxrok2212
03/29/2024, 12:19 AMsoxrok2212
03/29/2024, 12:20 AMsoxrok2212
03/29/2024, 12:20 AMCFSworks
03/29/2024, 12:22 AMCFSworks
03/29/2024, 12:24 AM[ 3392.667905] Unable to handle kernel paging request at virtual address 0000018001ffff78
is. What's interesting about that to me is:
@ rasm2 -d -b64 -aarm -e 'f94002d3 b4000173 d1036273 b4000133 f9402263'
ldr x19, [x22]
cbz x19, 0x30
sub x19, x19, 0xd8
cbz x19, 0x30
ldr x3, [x19, 0x40]
@ hex(0x18000000010-0xd8+0x40)
'0x17fffffff78'
CFSworks
03/29/2024, 12:25 AM0x18000000010
which looks more like a flags field to me, and then pointer arithmetic is being done on thatCFSworks
03/29/2024, 12:25 AMCFSworks
03/29/2024, 12:26 AM@ hex(0x18002000010-0xd8+0x40)
'0x18001ffff78'
CFSworks
03/29/2024, 12:26 AMsoxrok2212
03/29/2024, 12:31 AMCFSworks
03/29/2024, 12:31 AMsoxrok2212
03/29/2024, 12:32 AMsoxrok2212
03/29/2024, 12:37 AM$ zcat /proc/config.gz | grep CONFIG_RANDSTRUCT
CONFIG_RANDSTRUCT_NONE=y
soxrok2212
03/29/2024, 12:38 AMCFSworks
03/29/2024, 12:39 AMsoxrok2212
03/29/2024, 12:39 AMsoxrok2212
03/29/2024, 12:39 AMsoxrok2212
03/29/2024, 12:41 AMsoxrok2212
03/29/2024, 12:41 AMCFSworks
03/29/2024, 12:42 AMsoxrok2212
03/29/2024, 12:43 AMsoxrok2212
03/29/2024, 12:43 AMCONFIG_DM_THIN_PROVISIONING=m
to get lvm-thinsoxrok2212
03/29/2024, 12:43 AMARCH=arm64 CROSS_COMPILE=aarch64-linux-gnu- make olddefconfig
ARCH=arm64 CROSS_COMPILE=aarch64-linux-gnu- make bindeb-pkg -j $(nproc)
CFSworks
03/29/2024, 12:49 AMCONFIG_KASAN
and rebuilding the kernel. There'll likely be a performance cost to this, but with any luck it'll identify exactly where the corruption is happening.soxrok2212
03/29/2024, 12:51 AMsoxrok2212
03/29/2024, 1:11 AMCFSworks
03/29/2024, 1:12 AMsoxrok2212
03/29/2024, 1:12 AMCFSworks
03/29/2024, 1:13 AMsoxrok2212
03/29/2024, 1:13 AMCFSworks
03/29/2024, 1:14 AMsoxrok2212
03/29/2024, 1:15 AMsoxrok2212
03/29/2024, 1:44 AMsoxrok2212
03/29/2024, 1:58 AMsoxrok2212
03/29/2024, 2:16 AMsoxrok2212
03/29/2024, 2:17 AMsoxrok2212
03/29/2024, 3:00 AMCFSworks
03/29/2024, 3:53 AMsoxrok2212
03/29/2024, 3:54 AMsoxrok2212
03/29/2024, 3:54 AMCFSworks
03/29/2024, 3:55 AMsoxrok2212
03/29/2024, 3:56 AMsoxrok2212
03/29/2024, 3:56 AMCFSworks
03/29/2024, 3:59 AMsoxrok2212
03/29/2024, 3:59 AMCFSworks
03/29/2024, 4:02 AMCFSworks
03/29/2024, 4:03 AMmaxcpus=1
to the command line on boot ought to achieve that.CFSworks
03/29/2024, 4:04 AMsoxrok2212
03/29/2024, 4:05 AMCFSworks
03/29/2024, 4:05 AMbootargs
, but yes set from U-Bootsoxrok2212
03/29/2024, 4:05 AMCFSworks
03/29/2024, 4:05 AMcat /proc/cmdline
, it should be good.soxrok2212
03/29/2024, 4:06 AMsoxrok2212
03/29/2024, 4:06 AMCFSworks
03/29/2024, 4:06 AMsoxrok2212
03/29/2024, 4:07 AMsoxrok2212
03/29/2024, 4:10 AMCFSworks
03/29/2024, 4:12 AMsoxrok2212
03/29/2024, 4:13 AMsoxrok2212
03/29/2024, 4:14 AMsetenv bootargs 'maxcpus=1'
?CFSworks
03/29/2024, 4:14 AMsoxrok2212
03/29/2024, 4:14 AMsoxrok2212
03/29/2024, 4:14 AMsoxrok2212
03/29/2024, 4:14 AMsoxrok2212
03/29/2024, 4:15 AMCFSworks
03/29/2024, 4:15 AMCFSworks
03/29/2024, 4:15 AMsoxrok2212
03/29/2024, 4:16 AMsoxrok2212
03/29/2024, 4:16 AMbootcmd=bootflow scan -lb
soxrok2212
03/29/2024, 4:16 AMsoxrok2212
03/29/2024, 4:16 AMscriptaddr=0x00c00000
CFSworks
03/29/2024, 4:16 AMsoxrok2212
03/29/2024, 4:17 AMsoxrok2212
03/29/2024, 4:17 AMsoxrok2212
03/29/2024, 4:18 AM## /boot/extlinux/extlinux.conf
##
## IMPORTANT WARNING
##
## The configuration of this file is generated automatically.
## Do not edit this file manually, use: u-boot-update
default l0
menu title U-Boot menu
prompt 0
timeout 50
label l0
menu label Debian GNU/Linux 12 (bookworm) 6.8.0-g235e32bb9813-dirty
linux /boot/vmlinuz-6.8.0-g235e32bb9813-dirty
initrd /boot/initrd.img-6.8.0-g235e32bb9813-dirty
fdtdir /usr/lib/linux-image-6.8.0-g235e32bb9813-dirty/
append root=UUID=afb0e1eb-b0ad-4b79-b9a2-8354818b3b63 rootwait
label l0r
menu label Debian GNU/Linux 12 (bookworm) 6.8.0-g235e32bb9813-dirty (rescue target)
linux /boot/vmlinuz-6.8.0-g235e32bb9813-dirty
initrd /boot/initrd.img-6.8.0-g235e32bb9813-dirty
fdtdir /usr/lib/linux-image-6.8.0-g235e32bb9813-dirty/
append root=UUID=afb0e1eb-b0ad-4b79-b9a2-8354818b3b63 rootwait single
(6.8.0 on my other node)soxrok2212
03/29/2024, 4:19 AMCFSworks
03/29/2024, 4:19 AMmaxcpus=1
to the append
for nowsoxrok2212
03/29/2024, 4:19 AMsoxrok2212
03/29/2024, 4:19 AMsoxrok2212
03/29/2024, 4:20 AMsoxrok2212
03/29/2024, 4:26 AMsoxrok2212
03/29/2024, 4:26 AMCFSworks
03/29/2024, 4:26 AM/proc/cpuinfo
?soxrok2212
03/29/2024, 4:26 AM$ cat /proc/cpuinfo
processor : 0
BogoMIPS : 48.00
Features : fp asimd evtstrm crc32 atomics fphp asimdhp cpuid asimdrdm lrcpc dcpop asimddp
CPU implementer : 0x41
CPU architecture: 8
CPU variant : 0x2
CPU part : 0xd05
CPU revision : 0
soxrok2212
03/29/2024, 4:26 AMCFSworks
03/29/2024, 4:26 AMsoxrok2212
03/29/2024, 4:28 AMsoxrok2212
03/29/2024, 4:28 AMCFSworks
03/29/2024, 4:28 AMsoxrok2212
03/29/2024, 4:28 AMCFSworks
03/29/2024, 4:28 AMCFSworks
03/29/2024, 4:29 AMsoxrok2212
03/29/2024, 4:30 AMsoxrok2212
03/29/2024, 4:30 AMCFSworks
03/29/2024, 4:31 AMsoxrok2212
03/29/2024, 4:38 AMsoxrok2212
03/29/2024, 4:38 AMCFSworks
03/29/2024, 4:38 AMsoxrok2212
03/29/2024, 4:39 AMsoxrok2212
03/29/2024, 4:56 AMCFSworks
03/29/2024, 5:08 AMsoxrok2212
03/29/2024, 5:10 AMsoxrok2212
03/29/2024, 5:10 AMCFSworks
03/29/2024, 5:13 AMCFSworks
03/29/2024, 5:13 AMsoxrok2212
03/29/2024, 5:14 AMsoxrok2212
03/29/2024, 5:14 AMsoxrok2212
03/29/2024, 5:15 AMsoxrok2212
03/29/2024, 1:02 PMsoxrok2212
03/29/2024, 7:29 PMsoxrok2212
03/29/2024, 7:29 PMsoxrok2212
03/30/2024, 2:19 AMsoxrok2212
03/30/2024, 2:19 AMCFSworks
03/30/2024, 2:25 AMsoxrok2212
03/30/2024, 2:26 AMsoxrok2212
03/30/2024, 2:28 AMCFSworks
03/30/2024, 2:28 AMsoxrok2212
03/30/2024, 2:28 AMsoxrok2212
03/30/2024, 2:29 AMCFSworks
03/30/2024, 2:30 AMsoxrok2212
03/30/2024, 7:35 PMCFSworks
03/30/2024, 8:11 PMsoxrok2212
03/30/2024, 9:21 PMsoxrok2212
03/31/2024, 1:00 AMsoxrok2212
03/31/2024, 2:07 AMsoxrok2212
03/31/2024, 2:39 AMsoxrok2212
03/31/2024, 3:06 AMsoxrok2212
03/31/2024, 3:19 PMsoxrok2212
03/31/2024, 3:19 PMsoxrok2212
04/01/2024, 12:52 AMsoxrok2212
04/01/2024, 12:54 AMCFSworks
04/01/2024, 1:00 AMsoxrok2212
04/01/2024, 1:01 AMsoxrok2212
04/01/2024, 1:01 AMsoxrok2212
04/01/2024, 1:01 AMCFSworks
04/01/2024, 1:05 AMsoxrok2212
04/01/2024, 1:09 AMsoxrok2212
04/01/2024, 1:11 AMsoxrok2212
04/01/2024, 1:11 AMsoxrok2212
04/01/2024, 1:11 AMsoxrok2212
04/01/2024, 1:12 AMsoxrok2212
04/01/2024, 1:12 AMsoxrok2212
04/01/2024, 1:14 AMsoxrok2212
04/01/2024, 1:14 AMCFSworks
04/01/2024, 1:21 AMsoxrok2212
04/01/2024, 3:21 AMsoxrok2212
04/01/2024, 3:21 AMsoxrok2212
04/01/2024, 3:21 AMsoxrok2212
04/01/2024, 12:59 PMsoxrok2212
04/01/2024, 1:20 PMsoxrok2212
04/01/2024, 1:20 PMsoxrok2212
04/01/2024, 1:20 PM[ 1620.511576] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
[ 1620.520153] CPU: 6 PID: 1 Comm: systemd Not tainted 6.9.0-rc2 #1
[ 1620.526868] Hardware name: Turing Machines RK1 (DT)
[ 1620.532315] Call trace:
[ 1620.535042] dump_backtrace+0x94/0xec
[ 1620.539144] show_stack+0x18/0x24
[ 1620.542848] dump_stack_lvl+0x38/0x90
[ 1620.546944] dump_stack+0x18/0x24
[ 1620.550648] panic+0x39c/0x3d0
[ 1620.554054] do_exit+0x834/0x92c
[ 1620.557659] do_group_exit+0x34/0x90
[ 1620.561650] copy_siginfo_to_user+0x0/0xc8
[ 1620.566227] do_signal+0x118/0x1378
[ 1620.570126] do_notify_resume+0xc8/0x140
[ 1620.574508] el0_undef+0x84/0x98
[ 1620.578113] el0t_64_sync_handler+0xa0/0x12c
[ 1620.582884] el0t_64_sync+0x190/0x194
[ 1620.586974] SMP: stopping secondary CPUs
[ 1620.591452] Kernel Offset: disabled
[ 1620.595344] CPU features: 0x4,00000003,80140528,4200720b
[ 1620.601280] Memory Limit: none
[ 1620.604690] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b ]---
soxrok2212
04/01/2024, 2:22 PMsoxrok2212
04/01/2024, 3:48 PMsoxrok2212
04/01/2024, 3:52 PMsoxrok2212
04/01/2024, 4:07 PM/dev/mmcblk0p3 on / type btrfs (rw,relatime,ssd,discard=async,space_cache=v2,subvolid=5,subvol=/)
soxrok2212
04/01/2024, 4:55 PMCFSworks
04/01/2024, 6:50 PMsoxrok2212
04/01/2024, 8:00 PMsoxrok2212
04/01/2024, 8:00 PMsoxrok2212
04/01/2024, 8:00 PMsoxrok2212
04/01/2024, 8:00 PMCFSworks
04/01/2024, 8:29 PMsoxrok2212
04/01/2024, 8:40 PMsoxrok2212
04/01/2024, 8:41 PMsoxrok2212
04/01/2024, 8:41 PMCFSworks
04/01/2024, 8:59 PMsoxrok2212
04/01/2024, 9:29 PMsoxrok2212
04/02/2024, 11:58 PMSpooky
04/03/2024, 12:16 AMsoxrok2212
04/03/2024, 12:17 AMsoxrok2212
04/03/2024, 12:17 AMsoxrok2212
04/03/2024, 12:17 AMsoxrok2212
04/03/2024, 12:17 AMsoxrok2212
04/03/2024, 12:17 AMsoxrok2212
04/03/2024, 12:18 AMSpooky
04/03/2024, 12:18 AMsoxrok2212
04/03/2024, 12:18 AMSpooky
04/03/2024, 12:18 AMsoxrok2212
04/03/2024, 12:18 AMSpooky
04/03/2024, 12:18 AMsoxrok2212
04/03/2024, 12:18 AMSpooky
04/03/2024, 12:19 AMSpooky
04/03/2024, 12:19 AMSpooky
04/03/2024, 12:19 AMCFSworks
04/03/2024, 12:19 AMsoxrok2212
04/03/2024, 12:20 AMsoxrok2212
04/03/2024, 12:20 AMSpooky
04/03/2024, 12:20 AMsoxrok2212
04/03/2024, 12:20 AMSpooky
04/03/2024, 12:21 AMCFSworks
04/03/2024, 12:21 AMsoxrok2212
04/03/2024, 12:21 AMsoxrok2212
04/03/2024, 12:21 AMCFSworks
04/03/2024, 12:21 AMsoxrok2212
04/03/2024, 12:21 AMsoxrok2212
04/03/2024, 12:22 AMsoxrok2212
04/03/2024, 12:22 AMCFSworks
04/03/2024, 12:22 AMdd if=/dev/urandom of=/tmp/test.bin bs=4096 count=1024
in a tight loop?Spooky
04/03/2024, 12:23 AMCFSworks
04/03/2024, 12:24 AMiperf3
or other non-filesystem I/O?soxrok2212
04/03/2024, 12:24 AMSpooky
04/03/2024, 12:26 AMsoxrok2212
04/03/2024, 12:26 AMsoxrok2212
04/03/2024, 12:27 AMSpooky
04/03/2024, 12:28 AMsoxrok2212
04/03/2024, 12:28 AMsoxrok2212
04/03/2024, 12:30 AMSpooky
04/03/2024, 12:30 AMCFSworks
04/03/2024, 12:30 AM/tmp
is typically one. mount | grep /tmp
to check its typeCFSworks
04/03/2024, 12:30 AM/
that gets cleared on boot)soxrok2212
04/03/2024, 12:32 AMudev on /dev type devtmpfs (rw,nosuid,relatime,size=16245064k,nr_inodes=4061266,mode=755)
tmpfs on /run type tmpfs (rw,nosuid,nodev,noexec,relatime,size=3254680k,mode=755)
tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev)
tmpfs on /run/lock type tmpfs (rw,nosuid,nodev,noexec,relatime,size=5120k)
ramfs on /run/credentials/systemd-tmpfiles-setup-dev.service type ramfs (ro,nosuid,nodev,noexec,relatime,mode=700)
ramfs on /run/credentials/systemd-tmpfiles-setup.service type ramfs (ro,nosuid,nodev,noexec,relatime,mode=700)
tmpfs on /run/user/1001 type tmpfs (rw,nosuid,nodev,relatime,size=3254676k,nr_inodes=813669,mode=700,uid=1001,gid=1001)
soxrok2212
04/03/2024, 12:33 AMsoxrok2212
04/03/2024, 12:33 AM$ df /tmp
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/nvme0n1p3 32937936 3895364 27348492 13% /
CFSworks
04/03/2024, 12:33 AMsoxrok2212
04/03/2024, 12:33 AMsoxrok2212
04/03/2024, 12:34 AMCFSworks
04/03/2024, 12:34 AMsoxrok2212
04/03/2024, 12:34 AMCFSworks
04/03/2024, 12:35 AMmount -t tmpfs /tmp /tmp
if you'd like to use RAM insteadsoxrok2212
04/03/2024, 12:36 AMsoxrok2212
04/03/2024, 12:37 AMsoxrok2212
04/03/2024, 12:41 AMfor i in $(seq 1 1024); do dd if=/dev/urandom of=/tmp/test.bin bs=4096 count=20480; done
soxrok2212
04/03/2024, 12:47 AMsoxrok2212
04/03/2024, 12:53 AMsoxrok2212
04/03/2024, 12:54 AMsoxrok2212
04/03/2024, 12:54 AMsoxrok2212
04/03/2024, 12:54 AMsoxrok2212
04/03/2024, 12:55 AMdd if=/dev/urandom of=/tmp/test.bin bs=4096 count=2048000 status=progress
soxrok2212
04/03/2024, 12:55 AMsoxrok2212
04/03/2024, 12:55 AMcp /tmp/test.bin /mnt/share/test.bin
soxrok2212
04/03/2024, 12:55 AMsoxrok2212
04/03/2024, 12:57 AMCFSworks
04/03/2024, 12:59 AMCFSworks
04/03/2024, 12:59 AMCFSworks
04/03/2024, 1:00 AMsoxrok2212
04/03/2024, 1:00 AMsoxrok2212
04/03/2024, 1:00 AMsoxrok2212
04/03/2024, 2:11 AMsoxrok2212
04/10/2024, 8:46 PMsoxrok2212
05/03/2024, 1:36 PMsoxrok2212
05/03/2024, 1:36 PMsoxrok2212
05/05/2024, 9:48 PM[ 564.470798] Unable to handle kernel paging request at virtual address 0068e851806854c4
[ 564.479679] Mem abort info:
[ 564.481027] Unable to handle kernel paging request at virtual address 0058db8f59c78b97
[ 564.482787] ESR = 0x0000000096000004
[ 564.482790] EC = 0x25: DABT (current EL), IL = 32 bits
[ 564.482793] SET = 0, FnV = 0
[ 564.482794] EA = 0, S1PTW = 0
[ 564.482796] FSC = 0x04: level 0 translation fault
[ 564.482798] Data abort info:
[ 564.482800] ISV = 0, ISS= 0x00000004, ISS2 = 0x00000000
[ 564.482802] CM = 0, WnR = 0, TnD = 0, TagAccess = 0
[ 564.482804] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
[ 564.482806] [0068e851806854c4] address between user and kernel address ranges
[ 564.482809] Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP
[ 564.482812] Modules linked in: ip6table_filter ip6_tables iptable_filter bridge stp llc cfg80211 rfkill crct10dif_ce rk805_pwrkey hantro_vpu pwm_fan v4l2_vp9 v4l2_h264 v4l2_mem2mem videobuf2_dma_contig videobuf2_memops videobuf2_v4l2 rockchip_thermal videodev videobuf2_common mc fuse ip_tables x_tables ipv6 dm_thin_pool dm_persistent_data dm_bufio dm_bio_prison libcrc32c dm_mod dwmac_rk stmmac_platform stmmac rtc_hym8563 phy_rockchip_naneng_combphy pcs_xpcs rockchipdrm nvme analogix_dp dw_hdmi cec dw_hdmi_qp dw_mipi_dsi drm_display_helper drm_dma_helper drm_kms_helper nvme_core drm backlight
[ 564.482878] CPU: 4 PID: 277 Comm: systemd-journal Not tainted 6.9.0-rc1-g99fc9cef1176-dirty #1
[ 564.482882] Hardware name: Turing Machines RK1 (DT)
[ 564.482884] pstate: 20400009 (nzCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
soxrok2212
05/05/2024, 9:48 PM[ 564.482889] pc : __d_lookup_rcu+0x4c/0xf8
[ 564.482901] lr : lookup_fast+0x34/0x144
[ 564.482908] sp : ffff800084343a90
[ 564.482909] x29: ffff800084343a90 x28: ffff800084343c80 x27: 0000000000000000
[ 564.482914] x26: 2f2f2f2f2f2f2f2f x25: d0d0d0d0d0d0d0d0 x24: ffff800084343c80
[ 564.482919] x23: fefefefefefefeff x22: ffff0001068a8026 x21: ffff00010777d000
[ 564.482923] x20: ffff800084343c80 x19: ffff800084343c80 x18: 0000000000000000
[ 564.482927] x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000
[ 564.482931] x14: 0000000000000000 x13: ffff0001068a8021 x12: ffff800084343cc4
[ 564.482936] x11: 000000046a4d2343 x10: 000ffffffffffff8 x9 : 0000000000000004
[ 564.482940] x8 : ffff00010777d000 x7 : e6e8e2ffb3bfa2a0 x6 : 0000000000210000
[ 564.482945] x5 : ffff800081b72000 x4 : 00000000001a9348 x3 : ffff0007da200000
[ 564.482949] x2 : 0000000000000004 x1 : 0268e851806854c8 x0 : 0268e851806854c8
[ 564.482953] Call trace:
[ 564.482955] __d_lookup_rcu+0x4c/0xf8
[ 564.482961] walk_component+0x28/0x190
[ 564.482964] link_path_walk.part.0.constprop.0+0x294/0x394
[ 564.482968] path_openat+0xa8/0xef4
[ 564.482971] do_filp_open+0x9c/0x14c
[ 564.482974] do_sys_openat2+0xc0/0xf4
[ 564.482979] __arm64_sys_openat+0x64/0xa4
[ 564.482983] invoke_syscall+0x48/0x114
[ 564.482989] el0_svc_common.con
soxrok2212
05/05/2024, 9:48 PMCFSworks
05/05/2024, 10:08 PMsoxrok2212
05/05/2024, 10:16 PMsoxrok2212
05/05/2024, 10:16 PMsoxrok2212
05/05/2024, 10:16 PM[ 460.507171] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
[ 460.515737] CPU: 6 PID: 1 Comm: systemd Not tainted 6.7.0+ #1
[ 460.522153] Hardware name: Turing Machines RK1 (DT)
[ 460.527597] Call trace:
[ 460.530321] dump_backtrace+0x98/0x118
[ 460.534506] show_stack+0x18/0x24
[ 460.538202] dump_stack_lvl+0x74/0xc0
[ 460.542289] dump_stack+0x18/0x24
[ 460.545984] panic+0x3b4/0x3f0
[ 460.549384] do_exit+0x8cc/0x9b8
[ 460.552984] do_group_exit+0x34/0x90
[ 460.556972] get_signal+0x954/0x97c
[ 460.560862] do_notify_resume+0x298/0x1400
[ 460.565432] el0_da+0x8c/0x90
[ 460.568733] el0t_64_sync_handler+0xb8/0x12c
[ 460.573489] el0t_64_sync+0x1a4/0x1a8
[ 460.577575] SMP: stopping secondary CPUs
[ 460.582047] Kernel Offset: disabled
[ 460.585936] CPU features: 0x1,80000000,70028146,2100720b
[ 460.591865] Memory Limit: none
[ 460.595271] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b ]---
soxrok2212
05/05/2024, 10:21 PMCFSworks
05/05/2024, 11:32 PMmemtester
just to eliminate RAM problems as the culprit?Spooky
05/06/2024, 12:32 AMsoxrok2212
05/06/2024, 12:32 AMSpooky
05/06/2024, 12:32 AMsoxrok2212
05/06/2024, 12:32 AMSpooky
05/06/2024, 12:34 AMSpooky
05/06/2024, 12:35 AMSpooky
05/06/2024, 12:36 AMSpooky
05/06/2024, 12:41 AMsoxrok2212
05/06/2024, 2:39 AMsoxrok2212
05/06/2024, 2:39 AMsoxrok2212
05/06/2024, 2:42 AMsoxrok2212
05/06/2024, 4:09 AMSpooky
05/06/2024, 11:08 AMSpooky
05/06/2024, 1:05 PMsoxrok2212
05/06/2024, 1:14 PMsoxrok2212
05/06/2024, 1:14 PMSpooky
05/06/2024, 1:16 PMsoxrok2212
06/28/2024, 8:20 PMsoxrok2212
06/28/2024, 8:21 PMsoxrok2212
06/28/2024, 8:22 PMsoxrok2212
06/28/2024, 8:23 PMsoxrok2212
06/28/2024, 8:34 PMCFSworks
06/28/2024, 9:07 PMCFSworks
06/28/2024, 9:08 PM[ 140.323832] thermal_sys: Failed to bind 'package-thermal' with 'pwm-fan': -17
[ 143.054645] rockchip-pm-domain fd8d8000.power-management:power-controller: failed to set domain 'gpu', val=0
[ 145.762181] rockchip-pm-domain fd8d8000.power-management:power-controller: failed to get ack on domain 'gpu', val=0x1bffff
CFSworks
06/28/2024, 9:09 PMsoxrok2212
06/28/2024, 9:11 PMsoxrok2212
06/28/2024, 9:11 PMsoxrok2212
06/28/2024, 9:11 PMsoxrok2212
06/28/2024, 9:12 PMsoxrok2212
06/28/2024, 9:34 PMsoxrok2212
06/28/2024, 9:38 PMsoxrok2212
06/28/2024, 9:39 PMCFSworks
06/28/2024, 10:20 PMrockchip_do_pmu_set_power_domain
is trying to tell the PMU to provide power to the GPU, and timing out while waiting for the PMU to confirm that it's powered (this is the failed to set domain 'gpu', val=0
)CFSworks
06/28/2024, 10:21 PMrockchip_pmu_set_idle_request
to try to take the GPU out of "idle mode" (I guess this supplies it with clocks?) but the PMU never acknowledges that request either (failed to get ack on domain 'gpu', val=0x1bffff
), probably because the GPU was never actually powered upCFSworks
06/28/2024, 10:21 PMrockchip_pmu_restore_qos
which tries to set some registers on the GPU('s bus controller) itself, but the bus sends back an error because the GPU isn't powered, and when the bus error gets back to the CPU it appears as a panicCFSworks
06/28/2024, 10:22 PMsoxrok2212
06/28/2024, 11:01 PMsoxrok2212
06/28/2024, 11:02 PMsoxrok2212
06/28/2024, 11:02 PMCFSworks
06/28/2024, 11:04 PMsoxrok2212
06/28/2024, 11:04 PMcat /sys/firmware/devicetree/base/gpu\@fb000000/status
okay
soxrok2212
06/28/2024, 11:05 PMls /dev/dri/
by-path card0 renderD128
CFSworks
06/28/2024, 11:05 PMsoxrok2212
06/28/2024, 11:05 PMsoxrok2212
06/28/2024, 11:06 PMsoxrok2212
06/28/2024, 11:06 PMsoxrok2212
06/28/2024, 11:06 PMsoxrok2212
06/28/2024, 11:06 PMsoxrok2212
06/28/2024, 11:06 PMCFSworks
06/28/2024, 11:13 PMsoxrok2212
06/28/2024, 11:30 PMsoxrok2212
06/28/2024, 11:32 PMsoxrok2212
06/28/2024, 11:33 PMCFSworks
06/28/2024, 11:39 PMCFSworks
06/28/2024, 11:39 PMCFSworks
06/28/2024, 11:40 PMCFSworks
06/28/2024, 11:42 PMsoxrok2212
06/28/2024, 11:43 PMsoxrok2212
06/28/2024, 11:44 PMCFSworks
06/28/2024, 11:47 PMsoxrok2212
06/28/2024, 11:50 PMsoxrok2212
06/30/2024, 3:00 PMsoxrok2212
07/18/2024, 1:32 PMsoxrok2212
07/18/2024, 1:32 PM[ 4.472806] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000008
[ 4.482717] Mem abort info:
[ 4.485846] ESR = 0x0000000096000004
[ 4.490059] EC = 0x25: DABT (current EL), IL = 32 bits
[ 4.496022] SET = 0, FnV = 0
[ 4.499455] EA = 0, S1PTW = 0
[ 4.502976] FSC = 0x04: level 0 translation fault
[ 4.508451] Data abort info:
[ 4.511688] ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000
[ 4.517843] CM = 0, WnR = 0, TnD = 0, TagAccess = 0
[ 4.523514] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
[ 4.529476] [0000000000000008] user address but active_mm is swapper
[ 4.536606] Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP
[ 4.543628] Modules linked in:
[ 4.547054] CPU: 3 PID: 31 Comm: cpuhp/3 Not tainted 6.10.0-rc1+ #4
[ 4.554076] Hardware name: Turing Machines RK1 (DT)
[ 4.559529] pstate: a0400009 (NzCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 4.567331] pc : blk_mq_hctx_notify_online+0x34/0xb0
[ 4.572903] lr : cpuhp_invoke_callback+0x2c4/0x560
[ 4.578278] sp : ffff80008249bd50
[ 4.581990] x29: ffff80008249bd50 x28: ffff800081da9000 x27: 0000000000000000
[ 4.589999] x26: 00000000000000ec x25: ffff0007fbf26c78 x24: 00000000000002f3
[ 4.598006] x23: ffff8000807d07cc x22: ffff0007fbf26ca0 x21: ffff000101c03978
[ 4.606012] x20: 0000000000000097 x19: ffff000101c03800 x18: ffff80008249bc78
[ 4.614018] x17: 000000040044ffff x16: 005000f2b5503510 x15: 0000000000000000
[ 4.622023] x14: ffff8000813b11a8 x13: ffffffffffffffff x12:000000034 x7 : ffff000100e8fe08 x6 : ffff000101aa5200
[ 4.646021] x5 : ffff800081dad000 x4 : ffff0007fbf26ca0 x3 : 0000000000000003
[ 4.654025] x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff000108496698
[ 4.662031] Call trace:
[ 4.66-
[ 14.356782] platform a40000000.pcie: deferred probe pt: deferred probe pending: (reason unknown)
soxrok2212
07/18/2024, 1:42 PMsoxrok2212
07/18/2024, 1:42 PMsoxrok2212
07/18/2024, 1:55 PMsoxrok2212
07/18/2024, 2:57 PMsoxrok2212
07/18/2024, 3:06 PMsoxrok2212
07/18/2024, 3:06 PMsoxrok2212
07/18/2024, 3:07 PMsoxrok2212
07/18/2024, 3:08 PMCFSworks
07/18/2024, 3:27 PMsoxrok2212
07/18/2024, 4:10 PMsoxrok2212
07/18/2024, 4:11 PMsoxrok2212
07/18/2024, 4:16 PMsoxrok2212
07/18/2024, 4:16 PM** File not found ubootefi.var **
Failed to load EFI variables
** Unable to write file ubootefi.var **
Failed to persist EFI variables
** Unable to write file ubootefi.var **
Failed to persist EFI variables
** Unable to write file ubootefi.var **
Failed to persist EFI variables
0 efi_mgr ready (none) 0 <NULL>
** Booting bootflow '<NULL>' with efi_mgr
Loading Boot0000 'mmc 0' failed
EFI boot manager: Cannot load any image
Boot failed (err=-14)
pcie_dw_rockchip pcie@fe180000: PCIe-0 Link Fail
** Unable to write file ubootefi.var **
Failed to persist EFI variables
k#1.bootdev.part_3' with extlinux
soxrok2212
07/18/2024, 7:31 PMCFSworks
09/10/2024, 3:16 AMsoxrok2212
09/10/2024, 9:15 AMsoxrok2212
09/10/2024, 10:02 AMsoxrok2212
09/10/2024, 10:03 AMDhanOS
09/10/2024, 10:23 AMSpooky
09/10/2024, 4:13 PMCFSworks
09/10/2024, 6:41 PMecho fb000000.gpu > /sys/bus/platform/drivers/panthor/bind
post-booting. So the bug might be that there's nothing in the DT that tells Linux that the GPU power domain depends on the VDD_GPU_S0 regulator, only that the GPU itself depends on the regulator, and it's just gone unnoticed because I'm the one person who's trying to enable the GPU well after boot when the regulator gets switched off.CFSworks
09/10/2024, 9:57 PMregulator-always-on;
under vdd_gpu_s0: vdd_gpu_mem_s0: dcdc-reg1
made the problem go away. This dependency of the GPU power domain on the external regulator should really be memorialized in the DT somehow. Time to see if I can run an OpenCL workload as a quick test. 🤔
@DhanOS For your personal "support knowledgebase" -- this error on RK1 means "GPU is not receiving power from the board" in my case due to a software issue but can also mean that the regulator is damaged
[ 143.054645] rockchip-pm-domain fd8d8000.power-management:power-controller: failed to set domain 'gpu', val=0
[ 145.762181] rockchip-pm-domain fd8d8000.power-management:power-controller: failed to get ack on domain 'gpu', val=0x1bffff
soxrok2212
09/10/2024, 9:58 PMsoxrok2212
09/10/2024, 9:59 PMDhanOS
09/10/2024, 9:59 PMDhanOS
09/10/2024, 10:00 PMCFSworks
09/10/2024, 10:03 PMDhanOS
09/10/2024, 10:04 PMCFSworks
09/10/2024, 10:05 PMOther graphics APIs (Vulkan, OpenCL) are not supported at this time.
okay so it's just OpenGL for now.CFSworks
09/10/2024, 10:06 PMDhanOS
09/10/2024, 10:06 PMDhanOS
09/10/2024, 10:06 PMsoxrok2212
09/10/2024, 10:33 PMsoxrok2212
09/10/2024, 10:33 PMCFSworks
09/10/2024, 10:33 PMCFSworks
09/10/2024, 10:33 PMsoxrok2212
09/10/2024, 10:45 PMsoxrok2212
09/10/2024, 11:40 PMsoxrok2212
09/10/2024, 11:40 PMCFSworks
09/10/2024, 11:52 PMsoxrok2212
09/10/2024, 11:52 PMCFSworks
09/10/2024, 11:53 PMsoxrok2212
09/10/2024, 11:55 PMsoxrok2212
09/10/2024, 11:56 PMsoxrok2212
09/10/2024, 11:56 PMCFSworks
09/10/2024, 11:57 PMsoxrok2212
09/11/2024, 12:00 AM