Firmware updates aka Turing Pi 2 Community Edition...
# │forum
d
I started a thing:
Copy code
_____ _   _ ____  ___ _   _  ____   ____  ___   ___  
|_   _| | | |  _ \|_ _| \ | |/ ___| |  _ \|_ _| |_  \ 
  | | | | | | |_) || ||  \| | |  _  | |_) || |    ) | 
  | | | |_| |  _ < | || |\  | |_| | |  _ / | |   / /  
  |_|  \___/|_| \_|___|_| \_|\____| |_|   |___| |___| 
Community Updates by DhanOS
I took the firmware and started performing updates and I'm just releasing the first version with more to come. The first version,
v0.1.0
contains: - Added SSH root logins (an ability to log in via SSH without a need to do any changes using UART or adb shell) - Set static MAC address (
12:34:56:78:9A:BC
- later you'll be able to pick own MAC, especially useful with multiple boards) - Added
ntp
and
ntpd
(automatic time synchronization from the internet) - A few more smaller changes Full changelog and initial TODO: https://github.com/daniel-kukiela/turing-pi2-community-firmware/blob/master/changelog.md Project GitHub: https://github.com/daniel-kukiela/turing-pi2-community-firmware Feel free to propose what updates would you like to see.
b
I've not looked at the code (still waiting for TPI2 to arrive), but I assume somewhere in there is routine that powers up each node in succession (defaulting to 1,2,3,4). My I suggest the ability to set the order that each node gets powered. So for example, If I have a NFS server running on node 3 or 4, I'd like to power one of those nodes before nodes 1 | 2 | 3 so that the NFS server is up and running by the time node 1 & 2 join the crowd? Just an Idea
d
Sure, can be done and could even be configurable with the delays between when each node is powered. Adding to the list
b
👍 🙂 -- Oh and just for fun, I'm trying to compile your code under WSL:debian (so far so good)
d
WSL is not a bad idea, might be actually more convenient than using a VM for this
b
...this could take a while though... I'm using a Celeron J4125 😉
d
Oh it's going to take time anyway 🙂
w
Yes, the code powers up 1,2,3,4 at 1 second intervals. Looks like it could be changed very easily and the delay increased to give the NFS server time to get itself sorted out.
Doh, @DhanOS (Daniel Kukiela) beat me to it.
t
I suggest using AllWinner's registered OUI of DC:44:6D in your code. That way those of us who maintain DHCP servers should, by default, be able to find our TPiV2's BMC. It's unclear whether AllWinner doesn't program an address somewhere on the chip or u-boot doesn't pick it up.
Please consider releasing your changes to the "official" GitHub repo as one or more diff patch file(s). That way we can clone the official code and apply patches we want. Backing a patch out is a simple action.
d
Hmmm, patches are going to be hard to maintain since one might be dependent on another, or later updates might change things, but I'm already working on a config file that you can put on the SD card (or even upload to the BMC's internal memory) that will let you enable, disable or set things. I'm not going to make it "my firmware", but more like get the requests and suggestions and let people enable and configure features the way they want
I'll think on this solution, maybe even make it so you can enable things on the compilation time, but I don't know yet
t
Yeah. I started looking at it. Just ran diff with the proper options against both trees. Might be more work than it's worth.
d
And it'll get more complicated with more changes and a way for people to configure things
t
I guess it depends on the changes. The Rust code will supercede what's under "app".
d
The new firmware will make my efforts obsolete, but I like the challenge and I'll provide the updates until then. The new firmware is most likely going to be entirely new/different
s
Feature request: system event log to record board-level hardware errors (common feature in enterprise BMCs, code may be available in OpenBMC)
g
fix spelling mistakes
usb mode:                           Host                           Device                         Host conneCt node:
y
Why don’t you use a codespace for it?
d
Why, do you think, I would need it?
u
Codespace is nice, you can provide a prebuilt image for people who don't want to run all the compilation on their own computers, GitHub is very generous with their free compute hours per month
I tried compiling in a VirtualBox on my few years older laptop but when it started glowing red I used Codespace instead 😄
d
I'm providing firmware images, no one needs to compile nothing
I'm not denying Codespace can be useful, I'm just not sure how would it be useful here and now 🙂
u
It was useful for me, since 0.2.0 isn't up yet 😄
d
0.2.0 does not work as expected, yet 🙂
u
In any case, the problem it solves is less friction for anyone interested in doing their own hacking on the firmware, trying out things, compiling their own images
Also nice to run on GitHub's fast servers instead of a slow laptop
d
I might look into it, but I also have to set priorities, and I think they're currently somewhere else
u
It's not much work, I think you just have to start a codespace, install the things listed in the readme, publish that as a "prebuilt" codespace and then the community can improve on that
Ah, anyway, I thought the SD card workaround in the new commits would solve the "no sdcard" error message I'm getting when trying to flash the firmware via HTTP but I guess I need to get a TTL cable to fix that
d
That'll be fixed in v0.2 (I mean already is)
It's not in the original firmware
Also the comment is a bit misleading
The change mainly allows for bigger rootfs, but also requires an SD card to make subsequent updates
u
yeah, I understand
I just need to workaround the error to be able to flash the firmware
d
Copy code
cd /mnt
mkdir sdcard
reboot
u
no ssh/serial access yet
d
Because the original firmware does not provide it
My v0.1 does
You'll need an USB/UART cable now or
adb shell
with the MicroSD cable
u
Yep, gonna go shopping tomorrow
d
Only if you want the USB/UART, otherwise you can use MIcroUSB cable and adb
u
I'm not touching adb unless it's on a separate computer 😄
d
Well, ok 🙂
r
I would love to build locally on my Mac but am having trouble as the build does not support an ARM architecture - funny considering that the target is ARM... Tried running it in docker and emulated i386 but in the end I gave up... Long story short - great that you provide prebuilt images!
Once you tick off the static IP configured on an SD card I might switch as I don't connect the TP2 to my local network with DHCP.
Love the NTP support btw.
d
And I will keep providing prebuilt images - every time I'll have something I consider a stable version 🙂 If there is a demand I could also provide nightly-ish builds (once a commit is being made to the repository).
Thank you 🙂
r
Maybe it is too much of a niche feature, but if you add dnsmasq to the BMC image and ideally with configuration and the tftp root on the SD card, I would start experimenting with netbooting the nodes.
I can help with a few notes on how that would/could work, if you want, but as I can't compile (see above) can't do much else than that and testing.
d
I'm trying to sort out the available space on the eMMC, Turns out we might have less space than I though. The chip is 128MB in size (there's a mistake in the docs) and it is using doubled filesystem. This means we might only have 32 MB for the root partition. I'm talking with the dev of the original firmware to sort this out. I might need to move some binaries on the SD card and require the SD card to run. I don't know yet. But once I sort this and add a few other features I totally might do this too.
I'll definitely ask once I have the binaries added to the firmware image (one way or another)
b
HELL Yes! Looks like you and I are on the same wavelength here. I picked up a 128GB pcie card that I'm hoping to use on node 1 | 2 as a netboot/pixie server for the other nodes to boot from and avoid sdcard's in at least 3 of my 4 CM4's
h
Maybe use the eMMC to install a boot loader and install the system on the SD, this will reduce the wear (over time) of the eMMC.
y
Well, I thought to overcome the local limitations, it could make sense to use a cloudbased service that was created for this… integrated into the repo…. And can fire on release/push…. Whatever…
d
This is one of the potential ways to solve this. I'm still finding out the options and I haven't made any decisions yet. Maybe I'll provide 2 firmware versions - slim that fits into the internal memory and full that will require the SD card. I just don't know yet 🙂
Maybe, I'm not denying. I don't think I'm limited, though, with 18 cores and 128 GBs of RAM 🙂
But in this case that's independent from the BMC, right?
b
Don't know yet... still waiting for my board to ship. I'm just fantasizing of how I want to set it up
d
Ok, but so far I don't see how the BMC could be involved here 🙂 The NVMe PCIe card - you could connect it indeed to Node 1 or Node 2 via Mini PCIe or M.2, but then this node will become responsible for hosting the images, not the BMC like in the rubberduck's idea 🙂
b
Isn't the BMC involved in communicating with each of the nodes?
d
It is not, it does not provide any method for the communication. You might try to use UART which won't make much sense, though. The way to communicate between the nodes would be the ethernet network
rubberduck's idea is to use BMC and it's SD card to host the images. The BMC is connected to the same network switch as the nodes, thus could serve the images to the nodes this way
b
Oh, ok I misunderstood what was being discussed.
d
So, the way you think of it would not need any changes to the BMC 🙂
b
Well certainly not DNSMasq, but still hoping to be able to set the order & delay each node starts
d
And this is actually something I'm going to add to the firmware. 🙂 Currently, I need to solve the available space issue and learn about the possibilities, so the new firmware version will be released later than I wanted
b
🙂 I know, we talked about it a few days back. I have a PCIE 128GB card and a PCIE WIFI card (same card Jeff, used in his video PI networking faster than his MAC). Hypothetically the WIFI card will be used to allow external connections into the box and the 128GB card will contain 1 running debian OS and 3 network boot images for the other 3 nodes (all nodes are CM4's (8GB/Lite)
d
Yeah, I think I remember this talk now, thank you 🙂
b
👍
d
What was your reasoning for using the WiFi card again, please? The communication between nodes would be up to 1 Gb/s anyway
b
The WiFi will be used to ssh in the cluster from remote. i won't using the ethernet ports (unless the Wifi fails)
d
Ah, ok, thank you
b
2 weeks, woo woo!
d
What do you mean? Or maybe wrong channel?
b
Haha, yes, sorry!
d
I'm running a pool in the #754950670175436848 (so more people can see it): https://discord.com/channels/754950670175436841/754950670175436848/1082662801874616352
s
This is the chip being used, right? https://linux-sunxi.org/T113-s3
d
Yes, correct
s
There are tools and info listed on that page that may help.
d
Yeah, but for now I want to keep the process simple so the most people can use the firmware. The pool seems to be in line. This means we need no tools for flashing 🙂 As for the firmware itself - I just started this project, I don't know yet where this is going to get us, if anywhere 🙂 If people like this work and we figure out we have ideas and we want o re-work it partially or entirely - we will.
s
Okay; just trying to find some tools and links. This may also be relevant though I'm getting out of my depth, having never built for this platform: https://github.com/linux-sunxi/sunxi-tools
d
Sure, thank you, this is always appreciated 🙂
t
It might be best to leave the T113-S3s internal partitioning alone and install minimally viable "firmware" on it. Just enough functionality to know whether a microSD card is installed and prompt for one if none is found, then retry the boot process. Move the wear and tear of multiple erase cycles onto something that can be replaced. It would require use of the BMC TTL or adb connection unless some sort in BMC LED error code pattern could be implemented. Wouldn't be the most user friendly solution. More like expert friendly. 😉
d
This is kind of what I am thinking of - a minimal firmware, like v0.1.0 (that adds SSH root logins, NTP and static MAC) as a failover and the full OS on the SD card. The idea is if your SD card break and you replace it, or the system cannot be booted for whatever reason, the failover firmware boots and lets you debug the problem and/or flash a new SD card. The whole root files ystem might be moved onto the SD card.
t
Works for me.
m
notes: not sure what's exactly is the purpose of this private key, but it generally not a good idea to have it in clear in a repo https://github.com/daniel-kukiela/turing-pi2-community-firmware/blob/master/br2t113pro/board/100ask/rootfs_overlay/mnt/self.key
i think this certificate should be autogenerated on first run instead, otherwise this means tpicli is not tls secure
d
It's also not being used for anything 🙂 I'm going to add SSL and will definitely not use this certificate 🙂
t
A self-signed key is suboptimal. A cert does need to be present for HTTPS support, but I don't like having to manually trust them. It's typical for low-cost routers to use this technique. Ideally Turing Machines would invest in a root signing cert, but I don't expect it.
d
I there any way to create a certificate that matches any domain name (also special-use domains like .local) and any IP that I am missing?
Like how you make a certificate for 192.168.0.123 and turingpi.local that's signed
t
Yeah. I know certificate trust chains are complex and inflexible. I wouldn't expose a Turing Pi V2 to the open Internet. I guess the BMC should default to a locally self-signed (not canned) one and have an interface to install one for those that need it.
d
The thing is more that .local is not TLD, you cannot have a certificate for such domain. This will yield some risk (because many networks could have turing.local and a single certificate will match them all). Same way you cannot have a certificate for an IP (technically you could, but not from the private IP pool, unless I'm missing something). So it's not possible to have such certificate, unless I'm missing something. So it's not a cheap router thing to have self-signed certificates
Without a public domain (that's also a TLD) your only choice is to use any certificate since it'll never match the domain (or IP) part in the browser, thus self-signed work the same way as any other certificates
The solution to this might be to have a public domain, like clients.turingpi.com and then a DNS server that contains private IPs of the boards (which would not be the best because of the security) and then the certificate for each clients subdomain, for example sometpiname.clients.turingpi.com and you'll have to use this exact domain and a private/public key generated for this exact subdomain on your TPi to have a signed certificate for your board
I haven;t seen any device in a local network with a valid certificate - no matter if it's a cheap router or expensive other device
Again, please let me know if I'm missing something. I might not know about something
So, anyway, a self-signed certificate should be generated on the first boot indeed, or there also could be added an ability to use own given private/puiblic key pair
With own private/public key part you can have a domain like somemydomain.com that's not really public but your network, and then you can have turing.somemydomain.com that has your private IP on your private DNS. Then you can use any cert provider, even free Let's Encrypt to add own, signed certificate (own, since it;s in your domain and under host name of your choice). Or get a *.somemydomain.com (so-called wildcard/star certificate)
r
What do you think about trying to have the minimal firmware check for the SD card and if finds one, mount it as an OverlayFS? That way you could customise and configure the firmware installation persistently without touching the emmc storage on the TPI2. Not sure how much sense this makes compared to the option of using a dual stage boot where the firmware just boots up whatever OS it finds on the SD card... Still learning about linux boot loading processes and embedded devices in particular, so... please bear with me. 🙂
d
Oh I'm still learning a lot too 🙂 I'm not sure about the OverlayFS, I might check it. Then booting a full OS from the SD card also sounds like a good idea. I'm still considering options.
r
In both cases you would have to switch root to the SD card, methinks.
d
So, the idea I have is to have a root file system on the SD card and the minimal one (with SSH root logins, etc) in the internal storage (it'll boot in case the SD-card one would not boot giving you an option to debug and flash new firmware if necessary)
r
Something like that. The dual stage approach is how I understand any PC's boot loader to work (might be wrong or overly simplified). The difference here is that we may want a real OS as a fall back and not just a bootstrap to get the real deal going.
In my previous setup (prior to getting the TP2 board), I used one node with emmc and WIFI as the master that would run a dnsmasq (dns/tftp) server for the other CM4 lite nodes without WIFI to netboot from. The WIFI on the master node was used as a gateway for the local network used by all the nodes of the cluster. The idea with adding dnsmasq to the BMC/SD card is simply building on this and would be an improvement as we would not waste node resources on hosting the dnsmasq server and could make all nodes use the same base image. More importantly, we can isolate the control plane of the nodes from the dataplane using VLAN tags. One ethernet port, the BMC and the nodes would be tagged as the control plane, and the other port and the nodes would be tagged as the data plane. This (so far imaginary) setup would provide a good isolation and separation of concerns, where BMC is responsible for management and the nodes, well they are responsible for whatever workload you run.
s
suggestion: modify uboot.conf during build to enable CONFIG_AUTOBOOT_STOP_STR, see https://github.com/lentinj/u-boot/blob/master/doc/README.autoboot
with that enabled, it allows you to stop the boot process and drop into a u-boot shell. Without it, it will autoboot every time no matter what keys are pressed. And if you have a bad board or need to troubleshoot (ie your BMC kernel panics like mine), not having that option enabled limits troubleshooting ability
d
m
in my homelab I have created a CA on my opnsense and added the authority for most devices so i can created "valid" certificates
d
Correct, but this still makes it self-signed and requires manual work to add the authority to the every device in the network that is going to validate the certificate, right? I agree, though, this can be useful since you only need to add the authority.
Note for myself - power on via command for separate nodes: https://discord.com/channels/754950670175436841/754950670175436848/1083045348613820467
m
yeah i need to inject it into CAs for every devices, my CA cert has a 10 years duration so i hope i won't have to do it twice... still, on containers (docker etc) sometimes it require more fidgetting
d
Just want to say that the new firmware versions are going to be slightly delayed. I'm figuring out hot to compile the firmware for the SD card, this is all quite new for me. There are some custom scripts and modifications for the NAND Flash version that I need to find out if and how to replicate into the SD card flash. Once this is done I'll have to find out how to make a firmware that the failover OS and the first stage bootloader for the normally running one are on the NAND Flash and the root filesystem resides in the SD card I also not necessarily want to do all of this on my TPi2 boards, so I ordered myself san SBC with the same CPU as the BMC on TPi2: https://www.aliexpress.us/item/1005004448277970.html To have the same setup, I also ordered the same NAND Flash chips: https://www.aliexpress.us/item/1005005284945182.html And to put this all together: https://www.aliexpress.us/item/1005005265080663.html This will give me a standalone BMC with an ability to replace NAND Flash if I wear it out. So I'm waiting for this to continue the development
g
👍
t
👍🏼
b
👍
To: DhanOS memory 🙂 consider having the firmware also shut down the BMC and turn off the power LED when the power button is pressed, so that people know they can remove the power completely. Or do people expect a BMC to stay powered on when the "system" power is off? 🤔
k
Yes, the BMC stays on when you "shutdown" all systems. How else would you turn it back on remotely? The whole point of BMCs is that you don't have to go into a server room to turn on servers after a power cut or accidental OS shutdown.
d
The original question, asked over #754950670175436848 was if we shutdown the BMC system (since it's running Linux) before depowering the board. And this is the point - to be able to shut it down peacefully before disconnecting the power. Otherwise it should stay running, of course
I then asked to post the thoughts here so they do not get missed
t
In my experience, every data center rack should have remotely-manageable power control. This way, if a piece of equipment hangs, it can be power cycled first to attempt remote recovery before going on-site. That might seem like overkill for an edge device, but the sheer potential volume of edge devices can make it cost effective.
d
I mean I agree, but if you answering to what I wrote - the reason for that would be only when you want to remove the power from the board, so also the BMC shuts down cleanly
t
I was (maybe tangentially) agreeing with you. If I shutdown the BMC, I expect to power cycle the board. Understand the potentially unintended consequences of "your" feature request. 😉
d
Well, you put the power button for 1 second to turn on the modules, and for 3 seconds to turn them off. So, maybe if you press it for 6 or 10 seconds you can power off the BMC too. Hard to miss the moment when the nodes are shutting down. Also that'd be (as everything) configurable, so you'll be able to to disable this feature
b
yeah the original question was just about whether the BMC needs a way of shutting it down at all. If it does then holding down the power button for an extra long time as a way of doing that is a good idea. Otherwise you need to separately SSH into it or use its API before removing the power from the board.
d
TODO: let the
tpi
command to turn modules on and off separately
s
Honestly Redfish would be a better starting point than rolling your own for things like selecting submodules (e.g., blades) to control, handling provision of boot images, power cycling, etc. Perhaps you could start with the OCP mockup? https://redfish.dmtf.org/redfish/v1 That's why this was invented.
d
Honestly, I don't know. This firmware updates, so far, were just simple updates that did not take long to make. Starting over with firmware from scratch is a bigger time investment. We know that the Team is hiring firmware developers, so some new firmware is going to be written. This does not mean we cannot have a community firmware, but at this stage I'm not thinking into making a completely new thing, but rather porovide missing pieces that people often ask about until a new firmware is made by the Team. We'll see, though.
g
@DhanOS (Daniel Kukiela) I can't seem to get your v0.1.0 firmware to flash via the OTA method. I follow the instructions to upload the image, a response page loads, and then the bmc reboots. After it reboots I check to see what the MAC address is and it's still random and I can't ssh in with root. Got any advice on what's going wrong?
d
This means the flashing does not succeed. Do you potentially have UART? UART will let us see the logs and the messages about flashing and let us know what's going on.
Also, when you upload the firmware, some JSON object is being printed in the browser. What does it say?
Can you tell me which file (the extension of the file) you are using?
g
I don't have UART yet, I was hoping to avoid having to order the converter but may just have to now. I am using the file turing_pi2_ce-0.1.0.swu and this is what is being printed to the browser (ip address redacted): Content-Type: text/plain VARS: HTTP_CONNECTION=keep-alive HTTP_CONTENT_LENGTH=25036499 HTTP_CACHE_CONTROL=max-age=0 FILE_FILENAME_file=/tmp/tmp-0.tmp UPLOAD_DIR=/tmp HTTP_ACCEPT_ENCODING=gzip, deflate HTTP_ORIGIN=http://xxx.xxx.x.xxx FILE_SIZE_file=25036288 HTTP_CONTENT_TYPE=multipart/form-data; boundary=----WebKitFormBoundaryBayjokm3KqqgmF6c HTTP_REFERER=http://xxx.xxx.x.xxx/index.asp HTTP_ACCEPT=text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7 type=firmware HTTP_ACCEPT_LANGUAGE=en-US,en;q=0.9 HTTP_UPGRADE_INSECURE_REQUESTS=1 FILE_CLIENT_FILENAME_file=turing_pi2_ce-0.1.0.swu HTTP_HOST=xxx.xxx.x.xxx HTTP_USER_AGENT=Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/111.0.0.0 Safari/537.36 FILE_CONTENT_TYPE_file=application/octet-stream opt=set {"response":[{"result":"ok"}]}
d
Yeah, I understand you should not these cable
the file is correct and what's printed is also fine
Hmmm
I'm trying to think how to see what's going on
May I encourage you to try to flash the
turing-pi_1.0.1.swu
from here: https://github.com/daniel-kukiela/turing-pi-2-community-edition-firmware/releases/tag/v0.0.0 ? This is the latest official firmware. Maybe your board has some other version and this causes issues. And after this try to install mine again?
g
I've also tried connecting with adb but my computer never detects the device
d
Which port are you using for ADB?
Did you install ADB?
g
I followed the instructions on Turing's page. I didn't see anything about as port for the adb. https://help.turingpi.com/hc/en-us/articles/8686945524893-Baseboard-Management-Controller-BMC-
d
From these instructions:
g
Yup
d
Oh, you meant the port:
This one. Just making sure
g
Yeah
You asked what port I was using
d
Yes, and if you installed ADP
So you installed ADB and are using this port, right?
g
Yes
d
Hmmm, this is interesting because people usually don't have issues with ADB
Is this a Windows PC, Linux PC or Mac?
And additional question - do you possibly have an Arduino (or eventually a Raspberry Pi or similar device)?
g
Windows and I've got a few pi's lying around
d
You should not have any issues on Windows. Did you, however try to restart it? Does
adb
command work?
As for RPis - may I propose using one for UART to see the boot and update logs? You'll need 3 wires to connect it to the TPi2 board
g
when I have my phone connected to my pc the adb devices command finds my phone.
d
Is your TPi2 powered for this? I think it needs to have power
g
it is
d
Ok, i connected my board to my PC (also a Windows PC), downloaded adb tools and ran `adb devices`:
Copy code
>adb devices
List of devices attached
0402101560      device
I can clearly see my board instantly
I'm just checking, I did not use ADB with this board yet
When you connect your board, do you hear the sound that Windows emits when you connect a new device?
I'm trying to make sure the cable is ok and the board is actually connected
g
I don't hear any sounds and I've tried several cables and different usb's on my pc too.
d
But you do hear this sound when you connect, for example, a pendrive, or the phone you mentioned. Right?
g
when I plug in a flash drive yes, phone no.
d
Interesting, because it should emit the sound when you connect a phone (it, for example, does this for me)
I'm trying to think how to help you with adb
Like what the reason might be you don't see it
g
I'll try to track down my canakit box that has wires tomorrow so I can connect a Pi to the UART.
d
It mentions you have to do some steps on both RPis - you need to do them on the RPI only in our case, the BMC is all set for UART communication
g
gotcha, I'll take a look tomorrow. thanks for the help so far, hopefully we can get this figured out.
d
I'm sure we can
As for the
ADB
, head to the Device Manager and see if you have any unknown device listed (like with missing drivers)
You might also try to restart TPi with a cable connected, wait for 30 seconds and try again
g
nothing even after a restart of TPi
d
This is odd, to be honest
What about this?
g
tried that, still getting the random mac address
d
Yeah, ok, at this point best we can do is to see the serial console output
g
sounds good. talk to you tomorrow once I find the wires.
d
👍
g
I finally got the BMC updated after reformatting the sd card but I'm still not able to connect via the adb...
d
So the issue was an SD card that you insterted? Interesting. Did you flash the board with the official firmware linked from the TPi help pages?
g
Yeah, when I tried to flash it last week I got a message that said there was no sd card installed for the bmc. I formatted one I had lying around but it would never mount so I bought a new one and was able to get that to mount. Last night I reformatted it to ext3 but that wouldn't mount either so I reformatted it to exfat and that seemed to do the trick. I flashed your firmware.
d
Oh, so this was actually an important bit of information. If I would know it, I'll know what's wrong. The question, though, is how you got the firmware that requires an SD card on your board. Not by the factory probably.
g
I was trying to flash v1.0.0 from their github page not realizing that it wasn't a new version.
d
So they uploaded this modified version. Good to know.
I still don't know what's wrong with your
adb
, though
g
Yeah, me neither. I was wondering if it might be because I have autorun disabled on my pc, but I'm able to see my phone when I plug it in...
d
g
How difficult would it be to have a BMC terminal embedded in the web interface?
d
Should be relatively easy to implement.
w
Feature request: simple way to send "magic sysrq keys" to nodes. It seems like C-c or maybe C-@ is the way with microcom but would be way easier to have a tiny tool for it. Maybe I can just write it myself and compile
d
Yes, my idea is to make every node to have a special login and password and use them for things like power down, obtaining an IP (to show node IPS, grab fam PWM speed, system load, memory/cpu stats, etc). So the plan is a bit more than just power off 🙂
s
Honest if we had just started from OCP with Redfish rather than using a mobile phone chip as a BMC this would all be easy
d
I've heard this already. A new firmware is being developed by the Team. This is a temporary solution, a great base for future updates but to not dig a rabbit hole immediately and let people wait weeks for something usable 🙂
w
What compiler does one use to compile programs for the BMC? I don't see some other toolchain being installed in the instructions you give...
hmm
Linux turing 5.4.61 #31 SMP PREEMPT Thu Oct 20 00:11:14 CST 2022 armv7l GNU/Linux
I suppose just armv7 then
f
For the BMC, what size Micro SD card would do?
t
that's a good question; I have a few sitting around... was thinking about throwing a 128GB on there just so I didn't need to uninstall my board to swap cards in the future
I'm curious what it can support, some controllers for microSD have limits
f
indeed, there seems to be nothing detailing it whatsoever in the docs website
t
the BMC (Allwinner T113-S3) supports the SD 3.0 specification; I'm trying to determine what all that means
depending on whether that means SD 3.0 or SD 3.01 it probably means a max of 32GB to 2TB
the max for the BMC is somewhere between those two capacities...
f
yeah
t
unfortunately, this doesn't narrow down the possibilities
it definitely supports Class 10, but may or may not support UHS-I
definitely doesn't support UHS-II
f
Hmm
any idea @DhanOS (Daniel Kukiela) about the SD card spec for the BMC, if it's max spec 3.0 or 3.01?
From what i can gather, even 3.00 supports 2TB and Class 10 - https://www.taterli.com/wp-content/uploads/2017/05/Physical-Layer-Simplified-SpecificationV6.0.pdf Page 5 The Part 1 Physical Layer Specification Version 3.00 or later and Part 2 File System Specification Version 3.00 or later allow Standard Capacity SD Memory Cards to have capacity up to and including 2 GB, High Capacity SD Memory Cards to have capacity up to and including 32 GB and Extended Capacity SD Memory Card to have capacity up to 2 TB
Also Page 176 Optional conditions to indicate Version 3.00 Card A card supports any of following functions shall satisfy essential conditions of Version 3.00 Card (1) Speed Class supported under the conditions defined in Version 3.00 (2) UHS-I supported card (3) CMD23 supported card
UHS-II is 4.00 onwards
i'll probably go by that for now, cheers
b
I have a 985GB card in my bmc, it sees the card fine and reports the correct capacity but I haven’t tried writing or reading on it yet.
t
thx
f
I saw some of those weird GB sized ones on Amazon like 985, not singling you out specifically, but does anyone know why they exist and not just the usual 1TB's?
never seen them before up until now
t
Likely a no-name manufacturer. NAND manufacturers "bin" their die just like CPU manufacturers do. Not all NAND die pass the manufacturer's criteria for integration into their name brand products. The rejects, which are spooled for SMD equipment, get sold in bulk. The weird capacity basically says that some NAND pages were marked bad.
d
Well, looks like this question is already answered. But what would you use the card for there?
f
Doesn't the BMC require one? I tried to update it and it whined about requiring one
Content-Type: text/plain {"response":[{"result":"err:no sdcard"}]}
d
It does not require. I mean if you updated firmware to the official 1.0.1 one, then it indeed require a card for the firmware upgrade process, but only for that (the firmware is not being flashed to the SD card). The CE version of the firmware does not have this requirement
f
I even get the same error with turing_pi2_ce-0.1.0.swu
d
Oh, and it'll require about any card, the firmware is a few 10s of MBs only
Because you flashed 1.0.1 on your board. Once you flash
turing_pi2_ce-0.1.0.swu
onto the board, it won't require an SD card for flashing anymore
f
Ahh ok, and only because it's on 1.0.1 even to flash to the CE it'll require it, and just for that only?
d
Yes, 1.0.1 require an SD card to flash any firmware
f
okee 🙂
Can you shunt a (temporary) USB device to it instead, when the port is in host mode?
d
I'm not sure what do you mean
t
Yeah. Not really liking the fact that v1.0.1 doesn't have a web configuration page for BMC networking or just ingest the tpi.ini file from the microSD.
w
I mean really the BMC should behave like any other device and just have a randomly generated MAC that is stable per physical device
like every other device in the world
you don't have to configure a raspberry pi or even ESP32's MAC address
If the chip doesn't have some kind of serial number, it should try to find some source of randomness that is relatively stable per physical device and then store that MAC somewhere
f
It doesn’t have any authentication for the web interface too, which is oddly terrifying
For the firmware update, to use a usb drive for the storage
d
It is going to, though. This firmware is based on the original firmware which is in the alpha state
The BMC only uses the micro USB port and, as much as I am aware, this port cannot be set into the host mode. Update should not require any additional storage, at least for now. Additional BMCs storage is the SD card slot
f
But I’m on 1.0.1 but both it and you said it needs a micro sd card to then firmware above that later or even to the CE version. I don’t have a spare sd card so was wondering what other options were there to move to CE
d
None, SD card is a requirement. This is why I'm mentioning this and encouraging towards CE version of the firmware
w
They can't reflash using the otg port?
s
... it's likely asking a lot, but is there any likely possibility of a kernel/userland update to allow TRIM/DISCARD on the BMC's SD card?
(Is the BMC SoC one of those awkward ones which isn't supported by stock Linux, and needs a heavily-hacked source tree which hasn't been integrated upstream?)
d
The board can be re-flashed using this port and Phoenix Suite, but the process is not the easiest. I meant SD card is required now for OTA with this firmware version
I'm collecting all the ideas and I'll be trying to add them all 🙂
s
As-per https://discord.com/channels/754950670175436841/754950670175436848/1096066131397185566, could the documented support for ext2/3(/4?) filesystems on SD cards please be added/restored?
d
@DhanOS (Daniel Kukiela) request for CE firmware: show the version of the firmware somewhere in the UI? Maybe on the update page "Current version is XXX"?
I am using it, thank you for making it available. I'm excited to see what next you come up with
d
Yeah, this is a good idea and I'm going to do that
There are many things to come, I just need a bit of time to get back to it
d
I get it. And no rush, I just figured this is a relatively small change (ha!) that would make it better
u
fyi for anyone trying to build: sources.redhat.com is not reachable (at least for me) so the LVM2 sources fail to fetch and the build hangs. my workaround is to replace the LVM2 URL with the upstream FTP server URL instead:
Copy code
turing-pi-2-community-edition-firmware on  master [!?] 
❯ head ./buildroot/package/lvm2/lvm2.mk
################################################################################
#
# lvm2
#
################################################################################

LVM2_VERSION = 2.03.14
LVM2_SOURCE = LVM2.$(LVM2_VERSION).tgz
#LVM2_SITE = http://sources.redhat.com/pub/lvm2
LVM2_SITE = https://sourceware.org/pub/lvm2
t
Thanks so much for your work on this. In addition to the excellent suggestions of others, I have three ideas: 1. Ability to set certain hosts to startup automatically on BMC boot. I know others have suggested a custom boot order already, but this would allow unattended restores after a power outage. I have a kludgy workaround via an /etc/init.d/* script but something more accessible is probably more broadly useful. 2. A way to set a custom hostname for the BMC and provide ssh keys. This could be via the web interface or SD card file, and I'm ambivalent but think it would make sense to mirror whatever option you end up with for MAC address setting. 3. At the risk of starting a flame-war, including the nano text editor binary as a vi alternative.
When time permits I want to kick the tires on how exactly the GoAhead web server works. While day-to-day I'm likely to run it through an nginx reverse-proxy on a tpi2-hosted Raspberry Pi (and manage session encryption and authentication from there), I like the idea of having some remote troubleshooting access in case the Pi becomes inaccessible. It might well be better to use certificate-only SSH, instead of the complexity and potential attack surface if I go down the hardening GoAhead and/or using firewall rules route, but I still need to get my head around it. (as a note to future me, and others: basic config is at /mnt/route.txt, usernames and passwords at /mnt/auth.txt, and hosted files at /mnt/www) I'd also like to work out if it's possible to use cron on the BMC. The binary is installed already, but the files it needs are missing and get wiped on reboot if I manually create them. As a workaround I've had some success inserting extra scripts into /etc/init.d/ for some tasks (eg startup host 4) but they occur on start/shutdown only, rather than during routine operation.
d
Thank you for the suggestions. I'm up for all of them. Some of them are already planned (like a config file that will let you set a hostname, mac, ip, node boot order/delay/automatic boot, etc)
I think I need to go through this thread and create issues on GH for each of the ideas - this way it'll be easier to track them
t
Thanks so much!
r
@DhanOS (Daniel Kukiela) just got my board this week, I've plugged in a Jetson Nano but it's a real pain to flash. What would be good is an option to set a node into the boards recovery mode as outlined in this table from the [Nano datasheet](https://developer.download.nvidia.com/assets/embedded/secure/jetson/Nano/docs/JetsonNano_DataSheet_DS09366001v1.1.pdf). unsure if you'd have the ability in software to set a specific pin Low/High?
d
I'm assuming you've seen the official docs. Setting given node into the device mode sets it also into a recovery mode. Flashing in node 1 seems to not always work (some boards have issues with that) so you might try node 2-4
w
I love
micro
as a text editor, if we wanna get into holy wars 😄
(in all seriousness if you like nano for "works like a normal text editor", micro is a breath of fresh air in that it is like that even more so. I don't know of anybody who uses nano because they actually like it, just because it stays out of their way)
d
We're on a strict space constrain before we move to using the SD card, so probably only one of them for now 😉
w
Copy code
wijk% ls -lh $(which micro)
-rwx------+ 1 ubuntu ubuntu 11M Feb 20 02:11 /mnt/seaweed/workspace/bin/micro
wijk% ls -lh $(which nano)
-rwxr-xr-x 1 root root 269K Feb 19  2022 /usr/bin/nano
ok fine
d
11M 😮
No, definitely won't fit
I mean even if I wanted, 11MB is way more than we have left space 🙂
w
I suppose I can easily just
go install
it with my
GOPATH
on SD
t
My day-to-day text editor is VS Code, but I still have muscle memory for nano kb shortcuts
d
I'm using VS Code as well, Sublime Text before, and Nano for the console stuff
With the root mounted on the SD card, we could probably have 2 more things - OS updates via a command and packages installed via a command
Like no more flashing of anything to update the BMC or add some feature/tool
w
\o/
Web based serial console. webserial.io (though that seems to use browser serial ports) or https://github.com/ayushsharma82/WebSerial (but that's ESP32/ESP8266)
d
I started working on an alternative web interface for node management using more modern tooling this weekend and want to discuss potentially rolling it into the CE BMC firmware. Is that something you'd be interested in @DhanOS (Daniel Kukiela) ? I'm currently developing it with a locally running API proxy to get around CORS issues, and can see it running that way to provide an interface to multiple Turing pi boards. The BMC API also leaves much to be desired and am debating adding more work to the proxy server that provides a nicer restful or graphql interface. Don't have much done yet, just re-imagining the interface to allow the user to get a visual state overview and quickly adjust power and usb device/host more quickly.
In many cases users are going to automate a lot of this interaction in different parts of infrastructure, but I like making this pretty for those who just have one node and like pressing buttons.
d
Sure, that'd be great!
I'll re-read it again when I wake up, I'm too tired now to get fully what do you mean by the proxies
No problemo
d
The web server on the BMC is... well, what I can only say is that it is. I'm not yet sure what capabilities do we have
We also want some sort of login before anything can be accessed
d
Yeah, I looked though the webserver c code, very basic in there.
d
Also take a look at a way to show console output or maybe even accept console input? I want to add at least printing the UART things, but being able to actually also input command via web would be epic
I also want to make it possible to edit different BMC settings like MAC, IP, hostname, node power setting, etc, but we'll get to this point I guess
If we need something better for the web server, we'll look into it, but also we only need to be able to serve static files and otherwise query some API endpoints that are working already, so we might have all we need
d
Yup, at the moment, the basics are already exposed, in a non-restful way. Just working around it and thinking of improvements.
d
I'm not sure if we want to go restful way. JSON API seems enough, or maybe we can implement JSON-RPC (which should be a better fit here than a RESTFUL API). I haven't thought much about that either.
w
oh I wondered if you just built a new UI to the existing API
d
The API might and most likely will be modified to better fit theneeds
d
Just to clarify...to update to the CE version, it needs to be done via SD card. SWU version of the ce firmware on an sd card, and it will install from the BMC SD card on power up? Seems like I am missing a fundamental step or two, and the last thing I want to do is take risks with the BMC firmware.
d
You update it the official way, via the web interface which contains the firmware update feature (OTA). If you happened to update the firmware to the version 1.0.1 before, you'll need an SD card inserted, since 1.0.1 of the official firmware has this requirement
d
Ahhh..so nothing on the SD card, it just needs to be in the slot? I do have 1.0.1 installed.
d
Yes. 1.0.1 uses the SD card as a temp folder. A bit unnecessarily IMO
d
Perfect, thank you. I'll give it a try.
d
👍
d
A quick snapshot of the mockup, shows usb selector state, powered state via numbered avatar circle, and then a power toggle. Not much there but a start.
Was thinking of adding a contextual hotkeys for different needs, like power and usb host mode switching with confirmation dialog. Could also be used to chain sequences, like confirm power off and then usb mode switching.
t
The tpi command's functionality lags a bit from the web interface. I'd like to see the "poweron" and "poweroff" options/functions be more granular. Currently, "poweron" turns on SATA power, then each node in order. "poweroff" turns off each node in reverse order, then SATA power. I'd like to be able to power on/off nodes individually or in groups. The SATA power on/off function should be called when any node will be powered on and when all nodes are powered off. The web service performs the former, but does not power off SATA when all nodes are powered down.
The power button logic is pretty dumb. It doesn't examine the current node and SATA power state.
d
A granular way of controlling node power is planned. This is going to be configurable to you can set an order an a delay, auto power on of certain nodes, an ability to turn nodes on/off individually via the cli tool, etc. So indeed this is going to be added
t
Yeah, I can envision lightweight control logic running on the BMC and heavier weight logic using the RPC from a separate orchestration console.
d
As for the SATA power, this is not controllable. The way it works is the BMC is powered via the +5VSB line which is always powered even is the PSU is off. it's the same in the PC and lets it react to the power button, or wake on lan or use USB ports to charge when the PC is off, etc. Whenever the first node is powered, the PSU turns on and then also powers the SATA connectors. It should power off the connectors with the last node being turned off, but this indeed does not work iirc, but it;s a software thing.
The power button currently just turns off all nodes. Do you have any idea here? There's (currently) now ay to turn the node OS power off via this button, with with a help of some scripts, that'd be possible
I thin I like it, but could there be a way to make 2 interfaces? I also kind of like more like a proxmox-style of the GUI and I think I might be not the only one. I'm not asking to make 2 interfaces, but maybe have this in mind that they might be switchable. Also, teh host mode - it'd be nice to make it so you can denote if it's USB power on the back or BMC that the USB is connected to (I might try to make it so we can flash at least CM4s from the BMC to start)
t
I beg to differ on "SATA" power control. If you look at the tpi.c code under the poweron and poweroff functions, you'll find two GPIO operations. poweron does: common_device_gpioWirte(POWER_EN, GPIO_H); (Line 265) This enables power to the nodes, which remain powered off, and SATA. poweron also does: common_device_gpioWirte(SYS_LED, GPIO_L); (Line 267) This turns on the LED attached to S-LED. poweroff does similar common_device_gpioWirte operations to turn off node and SATA power, then the S-LED. These are on Lines 292 and 294.
d
What this does is it turn on the PSU by interacting with pin 16 (PS_ON) of the ATX power connector:
Then, I might be missing what do you mean
With this enabled, not only SATA ports are being supplied with power, but whole PSU starts running and delivers different voltages, also to the Turing Pi 2 board (including node power)
So this is not a SATA power, but power in general
The BMC is being powered by always supplied +5VSB (stand by)
t
I have a modular SFX-L PSU. My fans are attached to a SATA power cable. The POWER_EN GPIO operations turns the circuit on/off. I'm not using the PSU's other outputs at this time. I just want to be able to control nodes and the PSU power more granularly. When running at max, the fans can get annoying. Yes, I can dial down the case fan speed.
d
OK, but I'm still not sure what are you trying to achieve. To be able to power the nodes ON, you need to deliver power to the board by triggering POWER_EN GPIO, which, in your case, will also turn on the fans. You cannot turn the nodes on without turning the fans on in your case
The granularity might mean to be able to trigger POWER_EN separately from the node power to, for example, spin the disks or fans, but if you want to turn on even a single node, POWER_EN has to be triggered since this turns on the PSu and delivers the power to the board
Does this make sense?
Also I apologize, but I have a hard time to fully understand what are you trying to achieve
b
yes, thanks! nice work!!
e
do we think there will ever be more functionality to control the ethernet switch? I.E. add vlans, or bond the two 1gb ports out of the board etc?
d
This is one of the things I want to do and will definitely look into it.
Feature note: an ability to format and mount an SD card through the web interface? not sure how this fits the plan of moving the filesystem root onto the SD card (but also don't know when we move it there)
u
maybe "re-initialize" instead? User must supply a fresh rootFS image to write?
d
I mean, these are 2 things. One is to be able to use an SD card right now and another to move the root filesystem on the card. I haven't yet decided how the flashing process is going to look, but probably something like flash this via OTA and then flash another file via OTA to set the SD card, or the BMC is going to prepare the SD card automatically and copy the files over. If possible, I'd like to save users from flashing the SD card by hand (using their PC/Mac), but I'm not yet sure if this can be done this way. We'll see.
u
what about leaving a minimal BMC image on the included flash, like we have now, but then doing something like an overlay from the SD card? That way a user can still use the board right out of the box, but if they flash the "enhanced" firmware it gives them the option to store extras on the SD card image.
d
This is kind of the plan. The root partition is actually placed twice in the BMC's storage - one if a failover in case flashing did not succedd. I want to replace a root filesystem device only on one of them and move the entire filesystem on the SD card. It's like 20-ish MB in size
The goal then is that if the SD card breaks, the failover filesystem is being used instead and will provide the web interface to flash a new SD card
u
right, ok I see how that would work... that answers the question then: To format the SD card from the web UI, you can clone one of the rootfs partitions on to the SD card and expand it
d
This is one possibility, but the goal is to be able to put more binaries there and they would not fit into the flash partition (this is part of why the filesystem will be moved on to an SD card)
u
hm right...
d
Feature idea: an alternative to microcom? minicom? Is freezing caused by microcom?
u
screen?
p
screen
should be available in buildroot, would be good to give it a try
d
Copy code
login as: root
root@turing's password:
# screen
-sh: screen: not found
p
Have to run
make menuconfig
and it's under the Terminal/Emulators section
d
Oh, you said buldroot
Yeah
But it looks it's 1MB so probably won't get into the firmware for now, but we'll see
p
Everything has been amazing between the community and the devs. Props on your efforts @DhanOS (Daniel Kukiela) !
w
a bit more fully featured
d
I don't think we need much, just to run stable
w
Copy code
wijk% ls -l picocom
-rwxr-xr-x 1 andrew admin 57716 Apr 18 01:33 picocom
wijk% strip picocom
wijk% ls -l picocom
-rwxr-xr-x 1 andrew admin 44036 Apr 18 01:33 picocom
here we go
wijk% make CC=!arm:0 CFLAGS='-static -Os'
d
I'm kind of curious if this is a microcom issue, or the fact that the firmware is also reading UART
w
that's with
-static
because I can't be bothered getting the right glibc where I'm building it
d
40-60kB is great, we definitely can test it out
w
huh, it's not working though, still a glibc issue
d
But then, again, the firmware also reads the UART and a question is if they interfere with microcom/picocom/whatever else
w
maybe I do need to get the right glibc version
d
I did not look into it, but if we can integrate it into the webservice, then no matter the firmware, the web panel or a cli tool could use the same established connection and interact with it
It looks like the picocom got the last update 5 years ago
w
I've been having fun trying to get the right toolchain setup. I'll maybe see about publishing a dockcross image tailored for the BMC ()
=> [4/7] RUN mkdir /dockcross/crosstool && cd /dockcross/crosstool 10833.0s
ok so I have a working toolchain, and successfully built picocom
Copy code
$ docker run werdnum/dockcross-tpi2-bmc >./dockcross-tpi2-bmc
$ chmod +x ./dockcross-tpi2-bmc
$ ./dockcross-tpi2-bmc make
(it's an ARM64 docker image)
took forever to build too
r
I tried to make the build work on my M1 Mac some time ago (there is an old thread lying around somewhere in this discord server) but never got it working as the default toolchain depends on i386 tools. Tried building in docker but that not get that to work either. So
dockcross
gives me some home that I might be able to join the fun without getting an intel PC to build arm images on... Seems to be a problem though:
Copy code
➜  dockcross git:(master) docker run werdnum/dockcross-tpi2-bmc >./dockcross-tpi2-bmc
➜  dockcross git:(master) chmod +x ./dockcross-tpi2-bmc
➜  dockcross git:(master) ./dockcross-tpi2-bmc make
Unable to find image 'containers.andrewgarrett.dev/dockcross/linux-armv7-lts:20230418-484f06a' locally
docker: Error response from daemon: Get "https://containers.andrewgarrett.dev/v2/": Service Unavailable.
See 'docker run --help'.
Any idea what is missing?
d
Is it a private container registry?
w
Oh lol that's my mistake then, I'll have to fix the image
You can manually fix the image name in the script to point to the same correct image
The i386 image is a stupid bug in dockcross. One of the scripts just has if uname -a equals "x86_64" then amd64 else i386. I just had to fix that
r
Stupid me - did not realise it was a script, assumed it was a binary...
w
I forget the directory but it's like install gosu wrapper.sh or something
I also hacked the armv7-lts configuration to use glibc 2,25 which is why I rebuilt the image in the first place
r
So... building I get an error for the
cjson-rebuild
target (following the instructions in the
readme.md
of the official build...)
Copy code
$ ./dockcross-tpi2-bmc make -C buildroot  BR2_EXTERNAL="../br2t113pro"  100ask_t113-pro_spinand_core_defconfig

make: Entering directory '/work/buildroot'
...
#
# configuration written to /work/buildroot/.config
#
make: Leaving directory '/work/buildroot'
$ ./dockcross-tpi2-bmc make -C buildroot cjson-rebuild
make: Entering directory '/work/buildroot'
...
>>> toolchain-external-custom  Configuring
Cannot execute cross-compiler '/work/buildroot/output/host/opt/ext-toolchain/bin/arm-linux-gnueabi-gcc'
make: *** [package/pkg-generic.mk:282: /work/buildroot/output/build/toolchain-external-custom/.stamp_configured] Error 1
make: Leaving directory '/work/buildroot'
Note that I am using the -C switch to enter
buildroot
as the build script can't find the path
../br2t113pro
.
I assume that issue is caused by docker mounting the current dir for the build.
w
Yeah you have to do it in the directory
d
BUG: while turning all nodes off via the web interface (test the tpi tool), this does not turn off the PSU with the last node. It works ok with the
key1
button
j
>test the tpi tool since
tpi
calls the api, same as the web interface, the issue happens in either. I have tested with
tpi
here, both from the BMC and from a remote host.
p
Have been messing around with the BMC API and threw together a simple UI, should be feature complete by this weekend 🕺. It's
60kb
in size vs the
1.4MB
size of the current UI. Built with Picocss + SvelteKit. It also supports connecting to multiple BMC's from the same UI (as well as long polling the updates). Right now, a reverse proxy is required to patch the Headers until the firmware API is updated. Feed back is welcome! (Planning on getting it to 5x100 on Lighthouse) https://turing-pi-ui.vercel.app/ https://github.com/PhearZero/turing-pi-ui
d
You definitely should talk with @destomes who's also working on something: https://discord.com/channels/754950670175436841/1080282784570019942/1097566740625502298
I'm not sure how many designs we can have at once 😄
w
twice as many as half the number
d
Hah, theirs is a lot further along than mine. And both look to have the goal of also supporting external from BMC included support.
Also, there are so many differing frontend methodologies and frameworks, kinda hard to merge or cross develop things. Will take a look though.
d
Ideally, I'd love if you can come into some middle ground or agree on one or another. As I mentioned, I'd love to have something more like Proxmox, or maybe more towards Unraid, but the interface should not be biased with what I like. I guess we could try to have multiple interfaces too
Definitely I do not want you both put time into this and then make us to choose between one or another
I like how it looks like and how compact it is. The USB 2.0 toggle does not let you choose between host and device mode, though, and only one of them at a time can have the USB port assigned, so maybe radio button?
p
Yea I have everything disabled right now, it was mainly just to get a project scaffold. @destomes it should be pretty portable and I'd love to see what you have so far! Since it's all SFC + classless CSS it mostly will be copy-paste. Here is the main css library(also allows Bootstrap Grids): https://picocss.com/
d
I'm thinking of it and maybe we indeed can have a few different frontends so everyone can choose what they like.
p
Yea like a "theme" concept. As long as it's SPA/Static HTML it would be pretty easy to mange too
d
Then, to save space, we can gzip the files which will have an added benefit while serving them over http(s) as we'll only have to change the content type and the browser will automatically ungzip them
p
BTW just sharing for fun, not a big deal what path the different firmwares take from my perspective. I'll probably end up with my own fork of the firmware. Whatever works best for the most people for upstream, IMO. I'm big into HashiCorp/Mitchell Hashimoto and going to be running Consul/Nomad/Firecracker instead of K8's and wouldn't make sense to force that into upstream (unless enough people want it). Love the work everyone is doing! Keeps me motivated, really talented people in all the channels ♥️
d
I'd love to see any contribution, also your design, in this project. Whatever other ideas you have, some people will benefit from them too. The more I think about it the more I like the idea of having themes for the web interface. I'm a bit further with this idea and if we decide to stay with the embedded memory only, if there is a need, we could implement a feature where only a single interface of choice is going to be installed at a time. Choices are always good
So, I hope to see a pull request with your idea, or if you just say we can use your interface, we can make it fit the firmware - whatever works best
d
That's the beauty of open source.
p
Yea I'll definitely keep digging into the REST services side. That's the main reason for the UI is to test in the browser against the BMC. Was a quick and light frontend that won't weigh me down while I'm making API changes. I'll create a specification for a interface this weekend and we all can do some RFC. Maybe next week we can at least get a shared JS/TS library for the existing API until we formalize the spec
d
The thing is we'll have to decide of what to use for the backend. It can be RESTful API, it can be simple JSON or (my favorute) JSON-RPC, or maybe even something else
d
The stack I chose also based off a template and uses a few things I haven't used yet and am learning about. One of those being Vite for the bundling and Recoil for state management. So it has been a bit slow starting up. I also feel like we could have an optimal API voice meeting at some point.
p
Same, ProtoBuff is an oldie but goodie. MessagePack is the new hotness but vanilla JSON is fine for something attached to a LAN.
d
I was going to just make a quick fastAPI server with what I consider as optimal, and have it typed as well. and just temporarily proxy to the old API. but my understanding is the turning pi fork of the original firmware is also being worked on right now. Is it worth it to follow along and contribute to that one discussions instead?
d
I think it should be rather simple so different tools can use it easily. ProtoBuf or MessagePack - don't think we have to turn the communication into binary. Hmmm
p
Check out swagger/openapi if you haven't seen it yet. It's really useful for sharing API designs: https://editor.swagger.io/
d
No serious work is being done already. And indeed it'd be great to decide of the APi type and rework the backend for it
p
Lol yea React finally realized they need state management. Guess they are no longer "just a renderer" lol
d
Yeah, FastAPI supports outputting a generated openapi spec and hosted swagger UI
p
Sweet, pretty much everything these days is on the bandwagon. It's a lot of fun, I work the other way around with the codegen.
d
I'm leaning towards some REST implementation, hmmm. But the way the interface can be constructed would mean multiple calls to the API. this is where JSON-RPC is what I like - it allows you put all the requests in one object and process them this way
Like if you want to turn power on in Node 1 and set USB mode to device for that node at the same time )or even different node), that'll mean 2 API calls that not necessarily have to be successful
p
http/2 or QUIC maintains a single connection, they might be good options as well. Nothing wrong with JSON-RPC as well
d
I do not mean maintaining a connection. Also HTTP/2 might be a bit overkill for the BMC
p
Yea not sure, I've always leveraged nginx and it's been fine. As far KISS, regular REST with "bulk" endpoints could optimize the requests
d
For the purpose of this firmware, we have to keep the HTTP server simple and the communication simple too
d
I'm a big fan of graphql for bundling many typed requests into one. But that may be way overkill
d
We won't cram Nginix into this firmware - it alone would be multiple times bigger than the whole firmware 🙂
As much as I understand and like the ideas, it must to be simple
We're working with a device that's limited with the computational power and we have limited storage. Also 128 MB of RAM does not help either
I think the backend API should be a pretty simple one, REST-based probably, but maybe even not. JSON is native for JS and sending request as JSON objects to the backend might be all we need. But I also haven't thought about it much yet
d
@DhanOS (Daniel Kukiela) related to your UI suggestion. I'm unfamiliar with proxmox but have used vmware esxi, They appear to have similar interfaces? If you have any screens you're interested in emulating, I can take a look. Overall it looks like possibly grouping multiple turing pi2 boards and nodes in a left nav section and a detail section on the right is what you're after? with possible group selection and actions on those selections?
p
I'm guessing a Turing Pi datacenter would be the only users to really complain 🤣
This does look interesting for future research: https://github.com/microsoft/msquic
d
Someone is going to build a giant turing pi2 multi board cluster at some point.
d
I only mean I might be more into raw-looking interfaces instead of the (currently supper common) designs that put form before function. Like some interfaces are wasting too much space to make them look good, but this can also make them more chaotic. I think we can make an interface for me later, when e have the backend and some new frontend working 😄
Maybe not giant, but i want 4 boards and fit them into this case:
😄
d
You saying my interface is too pretty?
d
No, I'm saying people have different taste and this is why I like the approach to have multiple options so everyone can choose what they want. I haven;t seen much of your interface yet to say I like it or not 😛
d
Just poking fun. 😛
d
Speaking of multi-board solutions - I would also love to see the interface to be able to either manage multiple boards at once or the list of machines could redirect to the interfaces of different machines. Managing multiple machines at once sounds better since you could show all nodes as a single list
d
That is part of the intent with my interface, as I have two boards. It looks like @phearzero's supports that as well.
p
Also @destomes checkout #1097689024325492887 I did a info dump the other day. That's at least where I left off mentally
d
We can talk about the backend more, we can voice talk about it too, but then you think about it on your own remember how limited we are with hardware - this is pretty slow chip with just a little bit of RAM and storage and we have to be sane about the implementation. I'm still down for a simple http server crafted specifically for this purpose. I built 2 small servers already - one HTTP server in Python (yes, I know, but python was use there for other things too) where I integrated the GET and POST methods into the other parts of the system and it worked well, but also a SOCKET server into... GTA5 to let different clients (also a Twitch IRC connector) to interact with it. Both worked great.
p
It's a fun target, I'm sure the Disk IO is not the greatest as well. That's freaking awesome! Services are a lot of fun and can be really rewarding! Heads up, I'm have high functioning autism and CompSci is my main special interest. Spent most of my time in telemetry/analytics and crypto. Lot's of system/web whatevers just really enjoy the field and the people are amazing.
d
I must admin this makes me a little bit more excited about this project. I'll still need a few days before I get to that, but can't wait for the development 🙂
t
Based on examination of bmc, webserver and tpi, they all seem to use GPIO directly. There is no central daemon/API that coordinates all requests. IMHO, this should probably be bmc. Obviously, requests need to be serialized with mutexes to keep API requests from stepping on each other. It looks like webserver does this to some, decentralized degree. You can see this by turning on all nodes with tpi/webserver, then press the power button. The power button code doesn't check current state and just blindly turns everything on again (no effect).
d
Yeah, and this is why I thought of integrating of the picocom into the same service so we can have a single connection open and interact with it via command, via web interface or have an ability for the service itself to use it (for additional features like node detection). Indeed some coordination in the web interface is needed and I already thought of this a bit and I am not sure about the implementation - websockets are a bit to much for this, so maybe the front can "ask" the backend for given number of seconds, maybe 1 second and update the states. That'd be useful to also show other information live, for example IP addresses of the nodes which I'd also like to add there, or the BMC system information.
Sky is the limit
And the memory
(both RAM and disk 😛 )
t
Yeah. The RPC/listener should just acknowledge the request and block until the request is complete. The clients should respond with "busy" while a request is being processed. That avoids having to implement queuing and shifts the UI logic to the clients: tpi and webserver. They can respond to the user with an appropriate message to retry in 15 seconds. I'm not certain whether retry should be automatically attempted.
r
Please consider the "No UI" option as well. Personally I am only interested in the API and associated console tools for DevOps and if we end up with a configuration system that allows for the BMC image to be configured with different options, it would be great to be able to skip some components in favour of others.
d
I would opt for a timeout, so the request would wai (if blocked) but up to given amount of time
I would not deny the request unless the state does not match and it has to
r
Also, here is a controversial alternative to an API: why not consider MQTT? If we consider us to be in the IIOT space, MQTT based on Sparkplug B would be a natural and modern choice... 😉 https://sparkplug.eclipse.org/
d
Some sort of GUI will be needed at least for BMC upgrade. A very simple one might be used for this purpose. But good point
It's not controversial. I'm considering different ways of managing teh boards 🙂
r
Any MQTT server has the advantage of handling synchronisation of the commands, comes naturally with the queue/topic concept.
d
But this will have to also integrate nicely with the other ways of controlling the board, so it's not down to the MQTT protocol only
t
Would you accept and queue the request in the centralized service or just tell the client to resend. Timeouts might frustrate users (it's a feature, not a bug.)
r
No, but you could have the API publish commands to the MQTT server and thus handling the queuing aspect naturally.
t
As long as the interface is public, people can create their own "clients".
d
This is quite far into the development and I haven't thought about this well yet. I don't think people are going to send requests to the BMC in different ways at the same time, so some simple system would do IMO (simple synchronization primitive)
But still, you'll have a way to connect and interact with the BMC in other ways, not only by MQTT, which is different with typical implementations of MQTT
t
I don't expect you to implement this stuff in the CE firmware. This is something for the RUST programer(s).
d
This is the place we're discussing the CE firmware features 🙂
r
Yes, of course. What I meant is we need a queuing aspect for the commands whether they come from any of the tools you choose and MQTT servers provide a queuing system OOTB.
t
Would the MQTT run on the BMC or somewhere else? The BMC has both limited memory and storage.
p
I've run MQTT on Pi zeros, tons of implementations out there. The zero has a bit more memory but it hasn't complained yet. It's one of the setups supported by Adafruit and Circuit/MicroPython
r
I believe there are MQTT servers running on the Pico as a python lib and I am pretty sure we can find something similar for Rust as well... 🙂
d
Although we're not going with Rust here, I don't think so 🙂
r
Go?
d
C++ is what's being used currently
r
Ugh...
d
I'd love to have Python, but the memory footprint might be too big
p
Who's on team Zig? lol am I alone?
d
The whole firmware is C++
p
MicroPython was something I wanted to mess with on it at somepoint
d
But this is for microcontrollers, it compiles into the assembly
I will play with MicroPython with raspberry Pi Pico
r
Yeah, I know. But even Linux core developers are now accepting Rust in the build tree... Just saying...
d
But we're expanding what's currently done. Mixing C++ and other language in a single script is rather not possible 🙂
If this was a library or separate tool, then sure
r
I agree, keep it simple and stupid. You will just have very limited help from me with C++ as I will not invest time to (re)learn the language. Call it religious reasons... In any case, the question is moot as I still can't build on my M1 Mac. 🙁
d
I mean I don't know where we are going with this. I would have to learn Rust or Go, or whatever you propose 😉 Then, I don't think we want to start with re-writing the firmware from scratch in a different language - this will put us in a several month of work and the team is working on a new firmware already and most of the people will probably move to it in the future, we'll see. Then, the aim here is to provide features before the new firmware. but who knows when we end up being with this, maybe we'll be developing this firmware further and further indeed and have it as an alternative. then maybe we decide to rewrite it. For now, the goal is to provide more functionality. As for building, I'm using VMWare Player (free) and Ubuntu 22.04 installed in it.
r
Don't worry - I agree we should not rewrite anything. Was just carried away by the mention of Rust.
I will try VM player - would love to be able to join the fun and roll a version of my own.
d
And I'm even personally Keen to try Rust. I've been using so many programming languages in my life and this one seems like something worth looking at
m
Once you get used to Rust, then you will love Rust 😅
d
I've heard this so many times in my life already speaking of all kind of programming languages 😄
m
I've been using C++ for years but now having some affairs with Rust 😍. Then wishing C++ had a simple toolchain like Rust, just cargo install and run 🥹
d
This reminds me how I swapped PHP for Python. Maybe I'll indeed like Rust as much too 🙂
m
well, having a good editor that can show data type of each line of code will help a lot 😄
g
I taught myself Perl years ago but don't remember much anymore. I currently use RPG and C# for work.
k
Has been that way for me with Elixir, love a ton of languages but once I touched Elixir, I'm like, nope, don't need anything else at this point
(Though I do need to give Rust a proper try one day 'cause there's a lot of good synergy between Elixir and Rust)
r
Checked my own statement, and it seems like gcc support for Rust still has some ways to go, target date would appear to be this summer. So even if we would consider it, mixing Rust and C++ would entail mixing the gcc and rustc toolchains... which seems challenging.
Must have been rambling last night, there would be no MQTT server on the BMC. For MQTT support there would be a MQTT client that pulls messages of an MQTT broker and then executes them. Also forget anything about this being helpful for the other means of configuring the BMC and the nodes as the synchronisation I had in my head happens in the MQTT server. Of course we would only pull one message at a time, but to benefit from this we would need the server... (duh). In a pureplay MQTT scenario there would still be a need for bootstrapping the MQTT client configuration (server, credentials, etc) and also for flashing the BMC, so at the very minimum there should be a REST API for this. In summary: MQTT support would be complementary but as we are only talking about a thin client that would rely on the same logic for handling any other management commands, it could probably be very light weight. The advantage of MQTT, in addition to being a modern way of managing your IIOT device, would be that it should also support streaming of logs and any other data we could think of coming out of the RPI.
p
Yay for JS api clients 🎉: https://www.npmjs.com/package/turing-pi-js
r
Looks like Rust is a thing for the official firmware for the BMC at least: > We have a basic firmware built on C which is a starting point for building a state of the art firmware in Rust https://github.com/wenyi0421/turing-pi https://turingpi.com/jobs/firmware-developer/
j
To go just a tad further on this tangent. Do you guys recommend any resource for leaning rust? I have some c/ c++ background from uni days and mostly work with python nowadays.
d
Started learning myself, received 2nd Ed of "The Rust Programming Language", Steve Klabnik and Carol Nichols, (Rust 2021). I am enjoying their presenations
m
You can try Rustlings next https://github.com/rust-lang/rustlings
k
Oh that reminds me (and I promise to stop derailing this thread after this message 🤭) that I wanted to one day learn a language using Exercism only, it’s a brilliant website: https://exercism.org/tracks/rust
p
I've got
rtty
baked into my firmware but
rttys
and
gotty
are pretty heavy. Looking into other options but a lot of them are pretty heavy. Anyone know of any other good
xterm.js
backends that are lightweight?
s
I've noticed that the 12V fan-header only gets powered when the first node is powered-up, but then stays on so long as the system is powered, so far as I can tell: * Can we have independent control of the fan header? * Can there be a run-on option where the fan header is powered for a user-specified number of seconds after the last node is powered-down? (It's possible the the latter is already the case - but not for a customisable period, and I didn't wait for long enough!)
d
This header is connected to +!2V from the power supply. There's no way to control it in any way - neither to control fan speed nor the on/off state. The fact it does not turn off with the last node is a bug in the firmware. The Community Edition firmware is going to fix this soon. If you turn off the nodes using the power button or
key1
on the board, the fan will turn off
m
HI all, I am new here to the discussion. I had a few thoughts though to get more out of the BMC. First, busybox has a builtin httpd server. So could that be used in order to not use more memory? Second, I would like to try building a firmware with dropbear ssh instead of openssh to get a bit more back as well. Any thoughts?
d
About the https server, the web panel is tightly integrated with the firmware so the actions invoked by the panel or the
tpi
tool (which also queries the webservice) can be actually executed. With a separate webserver it's had to communicate withj the firmware somehow anyway. I'm not familiar with dropbear ssh. What would be the advantages over openssh?
m
Dropbear is a much smaller binary than openssh.
d
Even as small as 110kB, hmmm, worth looking at probably. OpenSSH currently is like 6MB iirc
Thank you for the suggestion!
w
I notice mentions around the place of ulibc, musl etc. Seems the existing distro uses ordinary glibc. Not sure how much space it would save but interesting - and might actually make cross compiling easier because there are so many glibc versions, idk
d
Some experiments might have been done
For example, the bmc binary uses about 4 MB in the memory iirc, but I did not check yet how much of that is an actual binary and how much variables (or memory allocations in general)
This is more than 1/3rd of the available memory and even if we still have free memory to work with, thinking of lowering the memory footprint is definitely a good idea since this will let us run more stuff in the future
p
swupdater is pretty heavy and it comes with it's own webserver. I think it's port 8080 on the BMC
That service does have websockets however, might be useful. Not sure if it's goahead or not.
I kill that pid everytime I reboot, going to test if I can flash with USB still and I might rm -rf it completely from my firmware builds
d
Hmmm, having it running all the time might not be the best thing. Thanks for pointing it out. The CE might only run it when necessary. Although I would like to have a way to perform updates more in a linux style than by OTA
Flash with USB? Tou mean the nodes? It's indepemndent from the node flashing
Which process is this?
p
The BMC flash port for PheonixSuite (MicroUSB). Bricked my BMC a few times yesterday and had to recover.
Copy code
bash
/usr/bin/swupdate -v -w -r /var/www/swupdate -p 8080
It's a few MB's in mem, It also has another process that starts using CPU once that service is down. I also normally kill it as well
Copy code
bash
/usr/bin/swupdate-progress
d
OK, thank you. I'll look at it and most likely just make the processes run only on OTA (like when you submit a firmware, run these binaries - since we do not autostart them and the software update causes the BMC restart, they won't start again automatically after)
p
This week I should have more free time, I'm going to update the readme for building on my fork feel free to borrow from it. Been working on web-api changes and experimenting, it's all in WIP shell scripts right now which might be easier for most people. They are all called
fix.sh
🤣
d
Thank you! I'll take a look and see which parts would be useful for the CE version 🙂
Is this a form of my repo or the original one?
p
It can be either one, just little things like looking in the buildroot output before copying the default configs/other libs. All based on the same README.md in both so it should be pretty compatible
d
I'm asking where to look to find your fork 🙂
p
I have both right now: Original: https://github.com/PhearZero/turing-pi CE: https://github.com/Telluric/turing-pi-2-community-edition-firmware My main work is off the Original but planning to port to CE relatively soon.
d
Ok, thank you! I'll take a look
p
It also is still on my workstation, just getting comfy with the project really. I'll ping you when I have something pushed.
Also investigated authentication for the web server. Luckily it's already builtin to goahead and you can find an example route at
http://BMCIP.local/auth/basic/admin/
The routes/configs are stored in the BMC under
/mnt/routes.tx
/mnt/auth.txt
The buildroot defaults are here: https://github.com/Telluric/turing-pi-2-community-edition-firmware/tree/master/br2t113pro/board/100ask/rootfs_overlay/mnt helpful link: https://www.embedthis.com/goahead/doc/ref/api/goahead.html
d
This is helpful. Thank you.
w
Is xinetd still a thing or did it get eaten by systemd?
Feels like "service that starts up when you try to connect to it" is a good fit for this kinda thing
d
Hmmm. In this case we also must start the service after successful firmware file update, so not a problem
w
Btw I haven't successfully done this yet, latest guess is it seems like a CM4 needs to be specially configured to put Bluetooth on mini UART to receive serial breaks
p
The process kicks off in
init.d
. Gives me feeellz from the past, haven't used
init.d
in ages. The process is in
/etc/init.d/S80swupdate
I also question the use of
collectd
and haven't really dug into what exactly it's trying to grab. It's a long lived service on boot as well
Alright, got my frontend running on the BMC. (Simplified it for this iteration) I'll post the docs tomorrow on how to deploy if anyone wants to switch up the web look.
d
May we discuss a way for all of the interfaces to switch between them? I'd love to add this one to the firmware, but we need a way to switch between them
Oh, did you add this already in the top-right corner?
p
That's in the
next
branch right now for a "global" dashboard. Wanted a super simple implementation right now but I can get a very basic redirect switch in there tmrw.
d
No rush, I need a few days still to integrate it
It'll be also good if we can somehow ensure same functionalities of all of the interfaces
p
Yea a good open discussion would be awesome. The challenge is always in replicating state. Super easy to run a service on one of the nodes and just always point to it in the browser. If each of the BMC's have to maintain a list it becomes less trivial. This design is specifically for the local interface on a BMC using the current web server.
d
I can handle the logic of interconnecting the boards, this isbthe easy part for me as opposed to all this HTML/CSS magic. The interfaces should be able to identify the boards and the nodes and I'll make sure to have the right data. I don't have the final idea about thatvyet, though
p
Yea that is exactly what's needed, a "BMC discovery" layer that maintains a peers list. It can be a part of the firmware or as a service that is run anywhere. Creating a service would be a good start and I'm pretty sure that is what @destomes was working on in Python. That way we just have the Python API as an input in the basic UI on the firmware. Once you add the discovery layer/api url, it will populate all the other nodes/full dashboard. If we find it works well we can look at porting it to the BMC in C/C++ with probably limited functionality.
Once we have a list of services we can do 99% of the work in the browsers to fetch state, add a few endpoints to the api to reduce requests. Then start to extend the bmc api to expose more functionality.
d
It must be a part of the firmware
The only thing in Python I've seen was a webservice wrapper, but I could have missed something
p
Sweet then that should be one of our first new endpoints on the BMC. /peers
d
I'll add interconnection directly to the bmc process
I'm not sure autodiscovery is a way, but can be helpful. I want to let people set the BMC hostnames and then we could have a way to cluster the boards by the host names. mDNS should do the rest
So, what I mean is you, as a user, will add hostnames to the cluster of the boards
They will then propagate across all clustered boards
p
Yea that would work great, a lot less to filter too on a busy network. Some kind of name convention or something similar. You enter a TLD like
amazing-cluster.local
and let the boards be subdomains.
d
Then, each boards interface could be used or we can make it so one of the boards is the master. But I like an idea of each boards interface since the master board could be offline and this should not cut you out of the interface
p
100% agree and each one just pulls the current state from the network. Masterless is my favorite topology 🕺. Super easy from the front end perspective, especially once we have cleaner routes/more concise endpoints.
The next version has multi BMC support and hopefully TTY support. Been digging for a few days trying to find a decent WebTTY project for embedded. I'm running a slimmed down
tty-share
on the BMC but it's still really heavy.
d
Hmmm. The idea was to show a log of the UART output so you can scroll through it and see all of the messages. Will this allow for it? I did not think of a ful TTY yet.
p
Yea we can just match WebTTY, it's what vscode/rstudio etc etc uses for term. We can even emulate other terms
d
A TTY with multiple features would be a nice thing. I'll have to see how it works. But we also need to integrate it with the BMC service so it reuses already established serial connections
We definitely should not establish multiple connections and let multiple clients to communicate at once. I see how this can cause issues
p
Yea just one at a time and only for convenience.
w
I volunteer as a beta tester
+1, my primary use case is to check logs, interactive use a distant second
w
lol, systemd is the last thing the BMC needs
p
The simple UI is stable (it doesn't have TTY in this version, just an updated UI): https://github.com/PhearZero/turing-pi-ui
s
Do you have any screenshots, or a demo instance? I'd also strongly suggest moving
/mnt/var/www
out of the way rather than deleting it (there is still ~6.7MB free on the BMC filesystem, plus you could move one or the other directory trees or the
build.tar
to SD card - and you could even
tar -cjpf - build | ssh $BMC_IP 'tar -xjvpf - -C /mnt/var
to avoid the tape archive itself using any of this space), as it gives testers a way to roll-back without having to reflash their firmware...
p
The demo instance doesn't have any mocked data but here it is: https://turing-pi-ui.vercel.app/ Sure thing, I can definitely backup
www
and add the pipe.
s
Ah, that's neat actually! From a UI/UX point of view, are there any additional types of input control available which can differentiate between "USB 2.0" (exactly one must be active at all times) and "Power" (any number zero or more can be active at once)? I guess that SD card details are all that's missing to reach the same functionality as the existing firmware? What might be neat (and already planned?) is to provide a way for nodes to push status updates to the BMC to reflect - for starters - an alarm-state (or even the active run-level?) which might be a case of accepting a POST-request and writing the provided contents to a file to later be read by the web-UI (if the built-in web-server has this capability)?
p
Just pushed the update to the shell script, thanks for the feedback 💗! 1. The USB 2.0 switch act as a single toggle when deployed, only one will ever be active and if something fails it reverts to the current state of the BMC. 2. Good point, I always forget about the sd card! Adding that to the missing features, should be easy to integrate. 3. Yea agreed! I think that is what drove a lot of us into the custom/community firmware side. As it is now, everyone is in "brainstorm" mode. The community firmware is maintained by @DhanOS (Daniel Kukiela) and is the definitive source for all things community firmware related. The main repo is here: https://github.com/daniel-kukiela/turing-pi-2-community-edition-firmware
s
Left a couple of comments!
d
Speaking of the USB, I proposed radio buttons As for the node-BMC communication I've been already thinking of utilizing the UART for this purpose. The nodes can return some statuses even now and we can read them, the same will work in the other way, bit with some service account on the nodes we could even log in (although this might be not necessary, just some service "talking" with the BMC
One more thing as for the USB mode - once we add a way to recognize or set which node has which module inserted, that'll mean we'l have different possibilities for the USB mode settings - jetson devices, for example, do not support the host mode. Ideally I'll split USB host/device from recovery mode and let set them independently. Recovery mode can be enabled/disabled in per-node basis like power and the USB mode can be changed only for CM4 nodes (Jetson devices should have the drop-down list locked to device)
s
Presumably the BMC can't actually read any USB data (only switch the connection) and so can't retrieve any USB device IDs on which to base this?
(Unless it's also possible to briefly switch the USB2.0 connection directly from the node to the BMC, disabling the external port?)
d
This could actually be a way of recognizing them automatically, but could also be in conflict with how someone uses the board. My way is to follow what the original firmware is doing - parse the UART logs and figure out the node type this way
it could be, yes, but this is what I mean - someone might be relying on some USB device availability maybe
p
I'm agnostic on toggle vs radio, anyone have any opinions? I remember the radio suggestion the other day but forgot to try it out.
s
👆This is better!
d
Yeah, I loike it more too 🙂
Things we could add in the future: - board name (liek when we use multiple boards) - host name (the name of host in the OS, read through UART) - ip address (same, read through UART or in other way) - another toggle for the recovery mode (when/if we split USB mode from the recovery mode - currently this toggled them both) - with multiple boards we'll need also a way to group the radio buttons across a single node (maybe using a color?)
Actually, I must admit I like it a lot 🙂
p
For multiple boards I was going to have headers with the server info and subtables. We could put actions/buttons on the header rows as well
d
Hmmm, maybe that'd be better. I don't know. I imagined showing all nodes more like they a part of a bigger cluster, but maybe splitting them by boards is a better idea indeed
p
Won't be much to mock up either one/both once I'm back into the multi-server version.
k
Copy-pasting a note from #754950670175436848 in here – when editing
S99hello.sh
(or anything in the
/etc/init.d
directory), be aware that the scripts in there are run on any run level change What that means, specifically, is that any commands you add in there will be run both on BMC boot as well as just prior to a BMC safe shutdown (via
shutdown -h now
or similar). If you want to run commands only on startup, check the
$1
variable for
start
value, similarly
$1
will equal
stop
when it’s time to shut down the service Ref: https://discord.com/channels/754950670175436841/754950670175436848/1099875572051615794
p
d
You crammed many different things into one PR. I'll review them and let you know. Thank you! Hope you don't mind if I'll have some different perspective on some of your changes (this is why it;s easier to make a PR per each thing separately, now I have all or nothing 😄 )
I'm not a fan of issue templates, btw. This is a small project and I mostly like when people just describe their problem instead of trying to fir it artificially into a template
I have a script to set versions already, you added own breaking mine 🙂
As for the rest - I'll review them later, but there are sadly things I cannot merge. I'm not sure what to do in such situation
p
It's just moved to ./scripts/set_version.sh and added a ./tpi.sh with an easier way for new people to try to build. It's kind of a black box for most people in it's current state
d
I also want to retain raw commands in the main readme - with the scripts file it's harder to find when something errors since another commands are still going to be executed. And breaking occurs a lot
If we want to use scripts, we have to ensure that each command finishes with the right stats and only then execute another. Also, during development, you do not run all the commands, more like use them selectively, hmmm I'm not sure how to handle this via the scripts
p
Check it out first when you get some free time. It's split into 4 parts and the aim is to get every developers working directory to the point where they can easily rebuild. There are checks for errors with tips on what is missing etc.
d
I will check, the only problem I have now is I'm not sure how to handle the fact that I have a different view on some changes that you made... Ehhh, sorry, it's hard for me to find the right words. The thing is I also have own ideas for some of the things and now I'm not feeling comfortable because I don't want to not merge your request, you put time and effort into it, but also some of the things are not fully in-line with how I want to approach them. Hopefully this makes any sense to you.
p
Yea it's no big deal at all, you can manage the project however. It's not wasted energy since I'm already maintaining a fork which will leverage them. It's really ok and appreciate everything you are doing I was just hoping to help other people join the fun on the CE side. Seems like the main issue is discovery, that was something that would be obfuscated by a build script. The compromise would to have both, just add a second markdown with the full line by line details and if something fails then try the line by line guide. For me I need to run both mkfw and set_version constantly so it made sense to me to have an entrypoint that does just that. Figured throwing in the build script would make it less intimidating
d
I'll definitely check your pull request and see about the parts I like. But what should I do, just copy the changes over?
p
We can discus on the pr. Evaluate it and we can even drop that one in favor of one we both like. It's not critical to me to have it merged, if you see something you like I'm more than happy to work towards a solution 🕺!
This is what I do to relax, definitely don't want you to be stressed over the code! 💗
s
Much as it’s a pain - and I’ve definitely tripped over this myself in the past, professionally - it’s best-practice to raise PRs with one intended outcome per PR… so in this case, perhaps one for the GitHub-related updates such as Issue Templates, one for the README, either one per script or one with all of the script updates/additions (and a GHA
shellcheck
of these would be awesome too!), in order to allow each to be considered independently.
w
set -eu
would do the trick right?
Ahh the eternal joy of trying to balance action and forward movement against having that movement go in the right direction. A challenge for any project
I mean you know, at work the standard approach would be to write a design document describing the proposed changes, alternatives considered, considerations and having the decision-maker approve the approach in advance. But even in that corporate environment I tell my reports that they can start development before the doc is approved provided they accept the risk that the approver might want changes
d
There is no single right approach here, but multiple right approaches, each with some different direction would also not lead to anything coherent
I also do not mean writing documents, but once the project will start going one direction, everyone will tune into it
I've worked at a corporation, I'm not right now and this feels better
I'm not sure I understand what do you mean here, especially the
against having that movement go in the right direction
part
Oh, ok, I think I confused the meaning of
against
in this context
b
OIC
Very nice, I'll have to look into this, maybe I can help contribute some of the things to it to get some of those TODO's checked off
d
I'll be putting al feature requests and ideas into the GH issues, make it going and then see where it'll take us 🙂
s
Also, if at all possible, a user should be able to reboot the BMC without any powered-on nodes losing power.
d
This is something we'll check if is possible
p
I'll be maintaining my fork along side CE/Official and feel free to borrow from it. Currently adding a new REST endpoint
v2
which cleans up the API calls as well as staging the API/UI for Authentication. https://github.com/PhearZero/zero-pi-2
I wonder if we could expose the UART over something like
ser2net
https://github.com/longshine/ser2nets <-- same thing but with WebSocket support
d
We probably can, but this would not work for the 1st stage bootloader and who know how big part of the second stage bootloader. And when it starts working SSH will be available already (if you want to use it instead)
w
I mean for the node UARTs
not sure I understand why what stage the node is in booting would impact whether you could forward its serial port over some arbitrary protocol
d
Because you'd have to somehow add this to the firmware
And even more make it work with a network that comes only later, during late 2nd stage bootloader
w
I'm proposing installing ser2net[s] on the BMC, to read the UART from the modules themselves
not the BMC UART, which seems to be what you think I mean
d
No, I actually thought you mean to install this on the nodes. And now I see that'd make no sense. Just woke up 😄
Yes, it'd make sense to put that on the BMC. What could be the use case (like compared to the picocom over ssh)?
w
I mean, not having to ssh into the BMC to read UART
especially if websocket support was available, you could do it from the web
d
I'm asking because people usually ask to keep things "behind" authentication of some sort
Even root logins over SSH are meh
w
indeed, websocket support would mean being able to do the usual mutual TLS or something
d
TLS means encryption, not necessarily authentication. I need to read through that more. ser2nets supports http for websockets, which means it might support authentication and then upgrade the connection to the sockets maybe
If authentication is not supported, that would have to be an option
But yes, definitely something to look at
w
idk I've just been interested in a bunch of these options for setting up remote access to the serial ports
d
And this makes sense indeed. I can, however, see how others might not expose additional things in a manner that does not involve authentication, so we have to take this into account
w
I suppose we're all free to add whatever we want to our own installations
d
I also can see adding this (and other features) more in a way where you choose what do you want to have installed/running
This way you can have this without any authentication. I would personally also use it this way
I'm using UART a lot and also I don't have to have a password on everything
I also like the idea
After a bit of thinking I think we could have the web interface but also a command of some sort that will let us enable/disable and start/stop services
p
I'm actually about to push UART to turing-pi-ui. The firmware already has it baked in and TTY is about 70% feature complete.
It's my weekend goal is to finalize all of the missing features from the current web ui. The main one is uart, I used a few heavy tty protocols and didn't even think that xterm.js could read from /api/bmc?opt=get&type=uart Just a heads up, this does mean that all nodes are exposed on the local network with RPC to the local nodes at the /api/bmc?opt=set&type=uart&node=0&cmd="Do something dangerous". Generally the UART should prompt with login but there could be some cases where the uart does not have shell login
Should work with CE and Official firmware (no dependencies needed at the moment): Here is the demo render: https://turing-pi-ui.vercel.app/ Repo: https://github.com/PhearZero/turing-pi-ui
d
Many people are going to love this 🙂
j
Just posting another fork of the original repository. My aim was to look at if we can extend the networking features of the BMC linux: https://github.com/j0ju/turing-pi/tree/main As an idea I also ported my random persistent MAC feature to the Community Edition and created a PR: https://github.com/daniel-kukiela/turing-pi-2-community-edition-firmware/pull/7
p
I'm hoping to post it in chat this weekend after I get firmware uploads 🕺 and it will be it's official "1.0.0". Super jazzed about it
d
Hi! Thank you for the pull request! There are 4 things I want to mention:. First is you should not use the Allwinner prefix. This is only being used by them to assign MAC to a device. When you are assigning a MAC on your own, you should use one of the ranges for the user-assigned MACs. The ranges are:
Copy code
x2-xx-xx-xx-xx-xx
x6-xx-xx-xx-xx-xx
xA-xx-xx-xx-xx-xx
xE-xx-xx-xx-xx-xx
Second - the goal is to be able to set given (or random) mac via the command and via the web interface. Can you implement similar thing into the
bmc
since this will be the desired way? Third - I only briefly checked your code - isn't it missing
ifup
? Fourth - if you want to modify a file like
/etc/network/interfaces
, it's better to hook the image creation and "regex" the change into the file instead of replacing it entirely. This ensures multiple updates can be done conditionally Additional question - what's the benefit of using the environmental variable compared to simply adding
hwaddr etehr
into the
/etc/network/interfaces
And I am going to integrate it into the firmware in a few coming days so everyone can simply flash new firmware and use it 🙂
j
1. I know that there are user ranges for MACs. Surely I can change that. Any range preference? The Allwinner OUI makes much sense, to me, better would be an OUI from TuringPI itself. 2.
fw_setenv mac_eth0 xx:xx:xx:xx:xx:xx
is the commandline for the shell. This can surely be wrapped into an API call to be set on demand. 3. that point I do not understand. What do you mean be
ifup
is missing? This code is executed by the ifupdown framework itself on
if up -a
eg. during the network init code. 4. Currently it is dependent on the defined
uboot-hwaddress yes
in the supplied
interfaces
file. 5. If the user changes the MAC in the filesystem it is lost after an update. On the other hand with the script if-pre-up.d it is set from uboot env. If the MAC address is persisted in /etc/network/interfaces it would change on updates. The uboot.env is persistend as long as you update only via webinterface.
About the 4th. Root logins for 1.0.1 based releases should work out of the Box, although that what the original firmware is doing, also allows empty passwords for root 😕
Regarding the storage of MAC addresses in UBoot. The original UBoot of TinaLinux, that seems to be used as base for this project. Is using the UBoot environment for storing MAC for LAN, WiFi and BT. The code is still in the UBoot bootcmd, but not used by the BMC Linux nor are there proper values defined. So this is NIH.
d
1. If Turing Pi had own range, it'll most likely be utilized (and this can also be a thing in the future), but no matter Allwinner or Turing Pi, you should only use MACs from the user-assigned ranges since you are setting it by hand. For a quick hack to set a static MAC I used
12:34:56:78:9A:BC
, so probably the first range 🙂 2 Yes, but I also meant this whole part to randomly generate a MAC 3. I mean: ```sh set_MAC() { ip link set down dev "$IFACE" ip link set addr "$MAC" dev "$IFACE" echo "$IFACE: set mac address $MAC" >&2 } ```and I should mention
ip link set up
bot
ifup
4. I understand and this is what I mean, I later provided an example of how you can only modify configuration files for the parts you need without overlay-in a whole file 5. The goal is to have more linux-like updates, the config files will remain. Also, there is going to be initial configuration file that you upload via the web interface or put into the SD card, so the user will keep it for when it'll be needed anyway
They do. I forked the firmware where there was 1.0.1 and added root logins before 1.0.1 was a thing. 1.0.1 also does the same thing - overlays a whole config file instead of modifying it on the fly to change this one thing
I guess if Turing Pi wanted to set MACs from their range, if they had some, they'll use this indeed. We're setting MACs by hand, though and some sort of a new firmware is coming so I might want to avoid using this way, at last for now
Does this make sense?
I want to add so many features to this firmware, but I also want to ensure they're added in a coherent way, otherwise we'll end up in many different ideas that don't work well together
j
1./2. Will update the PR. 3. the ip link up happens during ifup process. This is executed during ifup-pre, before bringing up the interface. The ip link set down is just to ensure I can change the MAC address, as it is only possible if interface is down. 5. Then this solution is best, as it stores the MAC accessible via the uboot env from even if running from SD card, and it is not harmed besides from a full reflash. (.img)
d
3. Right, I understand 5. Well, let me say this again - there will be a single file containing all the settings, not just MAC, but much more. This is why I want o start developing if further again, so the direction is visible.
Ok, maybe not like a single file that contains the settings, but I hope it'll get clear once we add thi to the firmware
j
This makes sense, there are many forks out there, some have barely understandable commits and the feature set and goal is unknown. I started from scratch for debugging fun and found your repo later, and the code a bit diverged.
d
The code has been barely touched by me, yet
We've collected a heck lot of feature requests here
My next step is to create GH issues for each issue or feature request for easier tracking and start implementing them
j
A single file is a bad idea. As the BMC is more a network device with some extras I like the config format and handling of OpenWRT.
d
Then, hopefully, more people will join and follow the same direction
I understand we do not have to agree on things, but there's an idea behind a single file.
The BMC is a general purpose CPU that'll be running all sorts of stuff. It's not just a network device. It'll be able to flash nodes in the future and set them up and manage them, for example
j
If you want a single file in the 1.0.0 is a file /etc/tpi.cfg 😉 And even the BMC initscript using this file, it is tinkering with the network interfaces after they already brought up 😉
d
A single configuration file is meant to let the settings to be centralized and easily manageable via all sorts of interfaces like the web interface, a command of some sort or even via external systems if you wish to use them to set the cluster
/etc/tpi.cfg
is a 1.0.1 thing
j
Jupp, but as there is barely docs on how that works from the BMC OS, it is currently and automotive chip, without HDMI, sound and other the bells and wistles.
d
And I do not want to implement it this way.
j
You are right, but the way it is implemented in for tpi.cfg it is not a good idea IMHO,
d
Well, I've heard this argument so many times already, and still don't know what does this mean. Sorry
I know, because there are as many ways to do one thing as many people you ask
The fact you don't like it does not make it bad automatically
j
eg. How is the Switch ASIC connected to the T113-S3 How the T113-S3 is connected to the Pi sockets.
d
Via I2C Via UART and LAN (through the switch)
j
Cool if you know this, which one of the I2C busses?
There are of course many roads to Rome. Doing the same thing twice and by the latter one by working against the framework(ifupdown) just feels bad from many perspectives. eg. reliability and understandability
I mean on theis page (https://help.turingpi.com/hc/en-us/articles/8685766680477-Specifications-and-I-O-Ports) you find many high level tech specs. Unfortunaltly I did not find more descriptive docs, eg which lanes are connected where.
d
My initial idea for the MAC and other settings is that the bmc process (or some other one) is going to read the config file during the boot procedure and compare various system setting and config file against this central setting repository. If any setting is different, it'll update it accordingly. Speaking of the MAC, for example, if it's set in a config file to be random and static, a single MAC should be generated and saved next to the common config file on the SD card. In case of a static MAC set by the user, this one will be used instead (from the config file). Then it's going to be set in the
/etc/network/interfaces
as
hwaddr ether this_given_mac
and on subsequent boots if this entry is missing or different than what's saved aside, it'll be updated. This means the
/etc/network/interfaces
will always contain the right MAC. the advantage of this is when a board break, this can happen, or for whatever reason it has to be swapped, you only move the SD card to a new board and it's being automatically set to the same settings. If you happen to change anything controlled by this config file later, it'll be changed accordingly and assure the change will be persistent by checking everything on boot. At elast this is the way I see this.
Because not much is released and the Team is working on the entirely new firmware, they recently hired a dev for this. This is only an alpha firmware made only for the boards to work and my fork is only meant to be a thing until the new official firmware is out. If people choose to use it anyway I'll continue my work here. Speaking of hardware, it might be opened in the future fully or partially, this is not known yet
j
That makes sense. my IMHO of MAC of Hardware NICs addresses, they should not change. Normally they are persisted in a kind of EEPROM or anyother close-to-the-hardware storage. Thats why I choose the UBoot env, too. If a user wants to change it later, this is persisted in the operating systems config storage. The can be the OneConfigFile or any other solution.
d
I agree here, but his is something the manufacturer is doing, not a user. Manufacturer sets MACs and they're even closely related to the serial number. If you as a user set it, it should not be stored as persistent IMO (meaning to override or set what you are not supposed to override or set) and should use the user-assigned MAC range. At least this is how I understand and see this.
j
Yeah. IMHO I do not care about MACs, as far there are no collisions on the L2 ethernet segment and the choosen MACs are not from the reserved range. I would love it if somewhere in the EEPROMs or Flash ICs on the TuringPi a unique pre-deployed MAC address per TuringPi2 board could be found.
d
Oh that'd make things so much easier indeed
j
That is a nice trick, looking into this.
w
Nice! Will give it a whirl tonight
d
May I have a few suggestions? And I'm keen to learn what do you think about them. 1.
USB 2.0
->
USB_OTG
2.
TTY
-> `UART`(since this does not have to be
TTY
in my understanding) 3. Could USB Host/Device be replaced with
Host (switch) Device
for easier access? (switch) is the same or similar element to how you toggle the node power 4.
Server
(Section) - I'd call it
BMC
I have a few more, but first I want to find out what do you think about these suggestions
p
Sure thing, I'm not a frontend/UI/UX guy so no real opinions other than needs to be used with one hand while working 🤣. 1. 👍 2. 👍 It's basically 1/2 TTY, saves what you type when you press enter it submits the command to the BMC. UART is closer to what it actually is 3. 👍 Very possible, I'll see where I can sneak it in, maybe the header of Nodes 4. 👍
d
I mentioned some other things, but I'll wait for when we throw more functionality to the firmware since a bit more work will be needed and there's no reason to change a single thing twice
Otherwise I really like the interface. Nicely done
w
Ah, real tty emulation would be neat but ofc this is way better than nothing
p
Yea ran into the same thing a few time, had to stop myself from locking it into any one API. The most widely adopted firmware builds are official and ce so they are a good start! I'm just going to maintain different versions,
main
branch will always be CE/Official until they diverge.
w
I also see a Docker compose file there - bet most people will want Kubernetes manifests to deploy to their cluster - should be an easy fix to turn those 3 directories into config maps and add an ingress
d
Do you think people will want to deploy this interface to the cluster instead having it on the BMC?
p
Yea we are going to have to implement our own protocol, have a few decent ones and going to check out
ser2nets
. The maintainer of tty-share is thinking of porting to Rust/C: Discussion here: https://github.com/elisescu/tty-share/pull/68
w
The proxy. Idk I assume the BMC can't run Docker
d
Why would it have to?
p
That is just to patch the headers while not working on localhost. Until we update the BMC firmware to have the correct headers we need a way to develop frontends
d
Why not to take the interface files and put them i the https server on the BMC?
w
So wait do I need that proxy thing in production?
Or is it just for local development?
p
No proxy in production, only development
BMC API has text/plain and no CORS headers
w
I see ok
So I don't need it as a user
d
And that'll change since I want to also gzip the files, but this is a different story 😄
p
It will error in the browser console because of the headers but the new UI will recover. Once the api is patched then the proxy is no longer needed. We can let users configure CORs at some point, it's best practice to not allow cross-origin requests
w
This is sounding more and more like a good idea
Some kind of plugin architecture
No idea what the interface would be
Maybe just an idempotent shell script that ensures everything is set up
d
I think we just wrap things into the services that we start/stop on demand. But we'll see. Thinking alone won't turn the ideas into the real features 🙂
I want to handle this firmware right so this is why I put so much time into thinking and finding out solutions and what I like
w
I suppose a fully fledged API would end up as basically reimplementing Ansible
d
I don't know Ansible, to be honest. If we get close - let me know 😄
p
There are a lot of interesting paths, any one of the configuration engines would be fun to integrate. Ansible is a good one since it's relatively light(minus the new features). HCL is also another great option. There is the possibility of creating 'recipes" with https://www.packer.io/ and just baking everything into images. Then the BMC just has to manage flashing and shutdown/startup for the most part. We just manage a list of "recipes" and images https://linuxhit.com/build-a-raspberry-pi-image-packer-packer-builder-arm/ I was hoping to get a WebTTY service running on the BMC, that's my current short/long term goal. Then deploy the same service onto the nodes. Most likely going to use tty-share's protocol. That's my current path at least for management.
w
I'm just thinking out loud what kinds of things a plugin or add on would want to do. Run services, create files, modify files, so on and so forth
It quickly approaches the capabilities of a general purpose server configuration tool
So then you have a choice. Structure everything in which case you've reimplemented something like Ansible. Or go free form and everybody has some common free form tool like a shell script that ensures everything is set up
My instinct is the latter is more in the spirit of what we would want to do. YAGNI and all that
p
Had this running on the BMC but it was way to heavy: https://tty-share.com/ even after I stripped it down to just the protocol/wss. Then started porting it to libwebsocket and the original dev is also interested, might get a free as in beer WebTTY
w
The worst option of course would be to try to build one's own generic configuration management system
d
Well, and this is exactly what I may end up doing, at least to start
I would not call it the worst, because we do not have to use what's already developed and try to make it fit our purpose. I don't know yet, we'll see
p
It wouldn't be difficult to just prebake a few images with a configuration agent or let users build their own image. Then the focus is only on "applying" configurations. Let the infrastructure as code frameworks deal with the integrations
d
I have some ideas on how to make the BMC configure the nodes in a way that's easiest for users. This will involve some custom solution I think but would also not need any images crafted specifically for this purpose
p
I think it will be difficult without a standard image since UART/serial settings will be different on every unknown OS. like shell login etc
d
It'll work more like you put a standard image on the BMC's SD card, choose node to flash, choose settings and it'll do everything else for you, even install some services if we end up using some (for example for remove power down of teh whole cluster)
Well, you might not even have to put an image on the card, it can be downloaded for you, one of teh official ones
p
Sounds like a perfect fit for managing prebaked images, then the UI in the BMC just shows what's available on github/etc and pulls from there
d
And out task here is to support as many node and OS combinations as possible. If this firmware lifts off, we'll have the community finding ways and making PRs
Well, I have a lot of ideas of what to do and how to do that already. I want to add node flashing from the BMC. but this is all a lot of custom solutions that have to be developed
If we, however, decide we'd like to support some way of automating the deployments, we'll add this too
Sky is the limit 😉
p
For sure, I generally agree with what you are saying. Node flashing is a really good start. Then it's just a matter of patching the image or offering prebaked images. I think having a repository of configurations would be the best bet. It is a good separation of concerns. Also good place to store images and different use cases in general, doesn't even have to be TP related.
w
Cloud-init is how Ubuntu does this
I mean look if somebody can figure out how to route the flashing USB lanes to the BMC and mount it I'm sure they'll be very popular
Maybe there's a schematic somewhere that would help
d
I did already by reading the firmware
p
Might bake a few images next week just to have around. Cloud-init is a great utility that gets abused in the best possible ways.
w
Oh?
d
I have ideas how to implement flashing from the BMC as I mentioned 🙂
And for the BMC flashing I'm not going to pre-bake any images. I feel there's no need to since I already have some ideas how to configure them on the fly
Well, when I said I'm figuring things out to make the firmware in the right way, I really meant it 🙂
I even found possible buffer-overflows that can cause the bmc process to crash, so I'll fix that too
p
It's a BMC agnostic, just meaning having TPI installed and built for the appropriate architecture. I don't think the BMC is going to be able to do that on the fly
d
I don't want to modify the images in any way, this is what I meant. I don't think there's a need to do so because we have a BMC
w
You're a machine, man
Well, a machine whisperer
p
Let me clarifiy, this isn't for the BMC directly. It's just general use case images. For instance Ubuntu images on the Raspberry Pi imager do not come with
extra-raspi
package installed. Just have a "Turing Ubuntu 20.04" that has it pre-installed so that the sata controller works as expected. Or is the expectation you put in a storage medium and send literally any image and the BMC will just "figure it out"?
d
I've heard both that
extra-raspi
helps and does not help. But this is not the point here. I understand but, like I said, I also have some other ideas I want to try, like using the regular images and configuring them after installations with a some sort of packages. This will let us not have to host any images. I'll write more about this idea when I check it and prove it working or not and when I shape it in a share-able way
So, to make the firmware development happen I bought this:
You can recognize the chip there
Then there is this spot:
And it'll take on of these:
And then one of these:
The last image is the exact SPI Flash that's o the TPi2. I have 10 of them
So basically this is the standalone BMC that I can then hook up to the Waveshare/Nvidia DevBoards for when I need UART or something
This would let me do most of the development and keep it outside of the TPi2
Also Flash wear out won't be a problem
Oh, maybge you actually cannot read the cip name there:
p
Should put a doc together with your dev kit once it's going, it would help people with their shopping lists. I already picked up a few things @DhanOS (Daniel Kukiela) has been experimenting with. Maybe even get some affiliate links going
d
I put links in the chat a few times I think
Let me find them
The last link does not work anymore, but you have a model
Also for the SOP8 socket - it was hard to find exact dimensions and it's a bit too wide for MangoPi and I haven't figured it out yet. But i will
You can also solder the SPI FLash directly, but I want to have a way to swap them easily
I might just extend the pads with some wires and folder the socket a bit further away and make it stick with a bit of hot glue 😄
I'll let you know once I join all of the elements of this puzzle 🙂
Oh, right, I also bought these:
Just in case
Might bprove to be useful
I can de-solder the connector:
Put it flash and use some wires to connect both up
p
Added them to DEVKIT.md in the repo. Never heard of Allwinner up until the TP.
Going to polish the toggle style and add firmware uploads tomorrow, hope everyone enjoys their evening/weekend! Render with @DhanOS (Daniel Kukiela) changes: https://turing-pi-urbirx0n6-telluric.vercel.app/
d
As for
USB_OTG
I kind out thought to keep the
_
there so it's the same as on the board and in the official docs 🙂 OTG Host/Device - I understand it's in works and you moved it, but if this helps I thought of something like this:
w
I imagined two columns of radio buttons all linked
Mode horizontally, node vertically, exactly one of the 8 radio buttons can be selected
Idk maybe radio buttons are passé
So I installed it but nothing changed. Different URL/path?
Ah I don't have an SD card
The deploy script silently fails in this case
hmm, I fixed that but now the web UI isn't loading, console says: http://192.168.124.54/api/bmc?opt=get&type=other net::ERR_EMPTY_RESPONSE window
times out it seems
d
I personally like the switch. Not sure about the position yet, though
w
OK so the community edition does not have this commit in it https://github.com/PhearZero/zero-pi-2/commit/b843660e1a57e4cc64e6ab50c70a5fc5805fedfe
also the BMC webserver firmware never responds to an HTTP request if it doesn't recognize what it is that is being requested
d
It is going to have only part of it, if anything. TO fit the UI on the TPi2 we'll have to make it matching the firmware
And this is something to look at. Should return HTTP 404 (or other error accordingly to situation)
w
I wonder if the UI repo deserves its own thread
well, the UI generally
it does claim to be compatible with your CE firmware
d
When the development start, but i mean really start, there will be so many topics that could be own threads. The problem is I don't want to take over this place with my firmware and I'm thinking of a separate server for the firmware development
By server I mean Discord server, of course
I've never checked that so I don't know if it is or not
w
any idea what license all this stuff is in?
I don't see one specified in GitHub
d
Good question, I don't. The Team calls it open-source
w
I was going to send a PR to quickly return 401/404/whatever in this situation, but by far the most uncomplicated way to do that is if it's released under an open source license
otherwise (or if it's AGPL or similar) I need a special release for my employer to waive their copyright interest in my work
oh well, I faked out the 'other' request for now by hacking the UI code. If I knew anything about newfangled web dev I'd make it optional or something
p
The bmc webserver needs a lot of attention, didn't realize "other" wasn't in CE. Must have forked from a while ago. We can update CE to at least match official api. Then just mount a new action for anything specific to CE
Yea it's a good idea, going to polish it this evening and then probably do a big 1.0.0 ann in a thread tomorrow evening.
w
can I suggest changing the UI code so that one failed request doesn't block the whole UI loading?
like if the 'other' request fails, the UI should load and just have 'unknown' in the relevant fields
n
I know I’m a bit late to this but is there a clear indicator that a change hasn’t been applied yet? With the official firmware, I’ll occasionally toggle a power switch, forget to apply it, and spend a minute trying to figure out why the node isn’t up. I would probably still do that occasionally if there was an indicator but I would figure it out sooner.
c
Hey @DhanOS (Daniel Kukiela) -- figured I'd loop you in since you're maintaining the big fork in the community. I'm hoping to get the structure of the (newer official) firmware repository cleaned up, but since that repository is still tracking the legacy/official one, I suspect that this reorganization may land upstream too, in which case it will probably then get pulled back downstream into your fork (this is such a hot mess I swear)... Anyway, since you might be impacted and likely have opinions about how this stuff is organized, I'd love to get your perspective on this:
j
👍 I like this, you basically are moving stuff that is externally "glued" into buildroot via the set of scripts into the buildroot directory 🙂 I did that partially, too for fun.
c
Thank you for your feedback! Yeah, the goal is that and to make the build process easier. If I'm about to start hacking on the devicetree and kernel drivers, I need to be able to just run
make
-> reflash and not have problems from, say, forgetting to copy over the .dts
j
for fast turn around time I created a flash.sh for my self https://github.com/j0ju/turing-pi/blob/main/flash.sh, but I guess yo uhave your own tooling already
c
This looks faster than what I'm planning on doing. Though, this script looks like it's meant for updating rootfs only (.swu) and not the kernel/dtb?
j
jupp, the generated .swu only contains the rootfs. Kernel upgrading is only possible by flashing the .img via LiveSuit/Phoenix Suit
c
My plan for kernel is to build as a uImage (not Android bootimg, no idea why it uses that currently) that bundles kernel+dtb, then I upload that to microSD, reboot into U-Boot, and use fatload to boot my test kernel.
j
I currently have kernel running that I am fine with. And I did not find yet the way how to do it the same way .
c
That way if I screw something up I just reboot.
j
after
sunxi_card0_probe
the mmc subsystem works fine in the supplied uboot. I tried with kexec to start on demand from sdcard, but that is currently a mess of kernel stacktraces and I do not feel hard enough for that rabbit hole 😕
c
Have you been able to find the "do not autoboot" interrupt key/sequence in the shipped uboot, or is there not one at all? I've been resorting to just removing the
boot_normal
from
bootcmd
, but wasn't sure if you've cracked that particular case.
j
see https://github.com/j0ju/turing-pi#troubleshooting
Copy code
Serial Console: if you cannot enter the serial console of UBoot although a bootdelay>0 is configured flash the .img once.
This suspicion was yesterday confirmed. The uboot envs are indentical between what I dumped from my untouched TuringPi2 vs that is generated out of the build from the vanilla 1.0.1 repo. At least the relevant code path/variables called by runcmd are identical. So it might be another difference. As far I understand the IMG also contains a UBoot
c
"Flash the .img once" meaning the shipped uboot does not respect
bootdelay>0
but .img updates include a uboot that resolves that issue?
j
yes, that is my current working hypothesis
c
I have a freshly-dumped mtd1 if you want to analyze it.
j
I tried to modify set differnt values or delete bootdelay, but it immediatly boots
mtd1 is the Uboot. I have a vanilla dump of that, here, too. Might be interesting if thre multiple versions shipped on different boards, but I will not reverse engineer it, if I have not to
c
Mine has sha256sum:
7e54140e013d66592dd8cd3b34ff5aaf6102df05d8f2d5d4a8d9836d653b742b
j
I used to dump
Copy code
#!/bin/sh

cat /proc/mtd > mtd.txt
while read mtd _ _ name; do
  case "$mtd" in
    mtd[0-9]: )
      mtd="${mtd%:}"
      name="${name#\"}"
      name="${name%\"}"
      dev="/dev/$mtd"ro
      f="$mtd-$name.bin.gz"
      gzip -1 -c < "$dev" > "$f"
      ;;
  esac
done < /proc/mtd

echo "name          size" > ubi.txt
for ubiname in /sys/class/ubi/ubi0_*/name; do
  [ -f "$ubiname" ] || \
    continue

  ubisize="${ubiname%name}/data_bytes"
  read name < "$ubiname"
  read size < "$ubisize"
  printf '%-13s %-d\n' $name $size >> ubi.txt

  ubi="${ubiname%/name}"
  ubi="${ubi##*/}"
  dev="/dev/$ubi"
  f="$ubi-$name.bin.gz"
  gzip -1 -c < "$dev" > "$f"
done
Copy code
3f35d43f10ef9e97a83433776f415350e8899557943ae093f17ebd9126958394  mtd0-boot0.bin.gz
cf6f61ed7b76d0d0628523a11d25afabf31d762111d83ccdd7ed38c24a9dbedc  mtd1-uboot.bin.gz
c1ecaad3d748ba9084d5b686a616fbaa29fb16953fd0eb314d8aeed0fac2fc77  mtd2-secure_storage.bin.gz
927e1a1bdb769278bc0c5ea9c712fe8cbc604c43345c0e7df1855370f1edbb6b  mtd3-sys.bin.gz
ed696d6ea86fc9d15b92cd123d7f3928bc630c082bbe2072bb4b7fe16f39b4af  mtd.txt
d8b2bf7517c044076b76dd6bb614748d7a0ee2277b91035b5d0bcf614bdd4794  ubi0_0-mbr.bin.gz
285557ad2cef6e131645144fcf2b4c2ef08c887e3224a04f43c4207350753f35  ubi0_1-boot-resource.bin.gz
8119bfe5d3b6a1c8a0ce24767d1150ec7fba87f93738a4cea799199b29b6a645  ubi0_2-env.bin.gz
a5a454cbeae2adaae7d8e4fe7d050ad8e42c5ec1af1fb1b8c372a6f70b5acf26  ubi0_3-env-redund.bin.gz
2d443bda03f69367c381a62e2c8c01459e414bca3cbbfe1f124433c288a4f01a  ubi0_4-boot.bin.gz
661699a360f8c882127e3a724e890bbe9f3a8073e8d2e305315abdb2c95ee2b6  ubi0_5-rootfs.bin.gz
1cfc89d37867e4aff2398c784bbac06e33d0fe87d4e6d568467f85ea769231d9  ubi0_6-recovery.bin.gz
43e99eb978f3aeb0e8e100ee626c05ecc777ec2953b9153a0112186e69ad6b70  ubi0_7-dsp0.bin.gz
98c03cfc5852a7caacabbb3dde1b162a120f340bc63797945c580cf62210bd8f  ubi0_8-private.bin.gz
feb473eec89d914f2dbbabc47ff2478d4680a4b8570a4888872c61b5245a2ce8  ubi0_9-UDISK.bin.gz
7b2e27d9149fd0430d7ec57b0dd5025cb8596c2b6c25589da6060366741cc0df  ubi.txt
Vanilla dump with 1.0.1 installed.
c
Mine is a factory dump (no .gz)
j
you are right ...
c
What's the output of
gunzip < mtd1-uboot.bin.gz | sha256sum -
?
j
7e54140e013d66592dd8cd3b34ff5aaf6102df05d8f2d5d4a8d9836d653b742b mtd1-uboot.bin And that means it is identical to your untouched rom contents.
after flashing the .img it is currently 8eae115254642bfe52e5b69572d8979ba9b5c72672a73ea5bbb171b2fc2b1c4c , so it is updated by the .img flashin procedure
If we could get one more confirmation for this hypothesis. This is a possbile way to brick your TPi and should be documented and fixed, e.g. If one would start developing without able to enter UBoot by serial interupt. What would happen if during weird bad update siutation ubi0_5 and ubi0_6 would end up in an reboot cycle without interrupt, The board would loop endlessly. As both known flashing/recovery methods need either Tpi2 in on the state of "linux booted", "adb/Phoenix-/LiveSuit-Update-Mode". In this kind of reboot cycle no interupt would be possible. --> BRICK To prevent this at least once the update procedure with Pheonix/LiveSuit should be done. Alternativly care fully updating mtd1:uboot via console with a known good version (UNTESTED!).
So this should also be a best practice documented somewhere for people starting developing or playing around with the bmc.
c
I'm not 100% clear on this but doesn't the T113 bootrom first check for a particularly-formatted microSD and, if it finds it, it grabs boot0(, u-boot, ...) from that?
So if you royally screw up your SPI-NAND, you can just pop in a microSD and get booted that way, then fix the NAND?
j
If thats the case that would be good. Otherwise, there would be no last resort in case of current boot ubi partition (ubi0_5 or ubi0_6) fucked up for any reasons with and vanilla UBoot.
c
Yeah, and I noticed no pin headers or anything for accessing the NAND, so even with modest tools I'd have no way of debricking (unless the microSD thing works)
j
Ack
Fun fact, the space in ubi0_6 is a litte smaller than in ubi0_5. For know I am unsure why. If you add so much packages or kernel modules that duing the post-build script the rootfs.ubi fits into ubi0_5 but not into ubi0_6. The process that prepares the .img, creates only 8 instead of 9 ubi volumes. And even the partition named "recovery" vanishes, if that resulting .img is flashed.
c
I need to learn about how the .img itself is made. The "tina-pack-tools" are kind of a mystery black box to me right now.
j
For me, too.
c
Could I trouble you to compute the SHA256 for this, but truncated to only the first
950272
bytes?
j
Sure. Is that the UBoot size in flash from beginning? You guess different random flash content in the tail end? 👀
c
I'm looking to see if this matches:
Copy code
turing-pi master $ sha256sum br2t113pro/board/100ask/dragon/u-boot.fex 
e70f8c152195000e5a6f4fd2149926e63f26a8801f301909818b102d17d190e8  br2t113pro/board/100ask/dragon/u-boot.fex
turing-pi master $ ls -l !$
-rwxr-xr-x 1 cfsworks cfsworks 950272 Apr 22 23:50 br2t113pro/board/100ask/dragon/u-boot.fex
j
isn't the .fex also prefixed with some kind of header?
c
One would think but it's copied verbatim into the .img:
Copy code
turing-pi/buildroot/output/images master $ dd if=buildroot_linux_nand_uart3.img bs=256 count=3712 skip=852 of=onlyuboot.fex
3712+0 records in
3712+0 records out
950272 bytes (950 kB, 928 KiB) copied, 0.00892286 s, 106 MB/s
turing-pi/buildroot/output/images master $ sha256sum onlyuboot.fex u-boot.fex 
e70f8c152195000e5a6f4fd2149926e63f26a8801f301909818b102d17d190e8  onlyuboot.fex
e70f8c152195000e5a6f4fd2149926e63f26a8801f301909818b102d17d190e8  u-boot.fex
I'm curious to know if I can just overwrite my /dev/mtd1 with u-boot.fex and whether that is sufficient for resolving the "it does not respect bootdelay" problem.
j
Copy code
~/src/build/turing-pi/bmc-release/turingpi-1.0.1.dump > dd count=1 bs=950272 if=mtd1-uboot.bin status=none | sha256sum
943e4a3eb9cfc624a6b63bbfd1fc0fd4a6ec0cecf357dfd2675f227f89cd9695  -
c
So, probably not a great idea to overwrite my /dev/mtd1 with that just yet
Ah, .img is a specific format for LiveSuit ("IMAGEWTY"), and .fexes are packages that are concatenated in there.
j
and in the mtd1 dump is a prefix "sunxi-package:...HEXGIBBERISH..."
found that while comparing uboot.fex from my last flashed build with the one in flash.
Maybe some hints for the allwinner bootrom?!? I am not that into these embedded arms before
c
Gotta say I am not a fan of prebuilt binaries in the repo... Wonder if my next cleanup step should be to get u-boot building rather than bundling a prebuilt one.
d
There's a recovery SD card image you can use to re-slash the BMC
Thank you. I will take a look. You created a pull request to an old official firmware which not likely is going to be changed/updated in this way. The Team is working on an entirely new firmware. So, I doubt I'll get affected, but I am going to take a look at that anyway and maybe use your pull request in the unofficial firmware. I have had a few days off and I'll finally shape the development of this firmware in a few days. I'll check and think if we can use your work here.
c
Slight clarification: the wenyi repository is the old one, this is the turing-machines repo which is hosting development for the redesigned firmware
d
I feel like I missed something (I was off for a few days) - where does it state this will be the new firmware?
c
(Also second paragraph of )
d
Ok, thank you. I did not see this repo and was unaware of it. It seems like the new firmware is going to be based on the current one and I even see that all of the "community" editions should be merged with this one, which puts a question mark on what we were going to do here
c
"we" being you and I?
d
I mean everything in this thread
c
I would guess "same thing, new location"?
Though fwiw I'm under the impression that the new firmware is going to be a sort of "Ship of Theseus" - it has started off being a fork of the current one but each part will be replaced iteratively, and by the time it launches it won't have much at all in common with the legacy firmware.
(What I mean by that is, whether this counts as the new one being "based on" the old one is as much of a question of perspective as whether the Ship of Theseus is still Theseus's ship)
d
I thought that the new firmware is going to be created from scratch and would take some time to even have a beta version. This meant we could improve this one with the most wanted features as it was always meant to be only a temporary unless the users will want to continue using it. If the changes are going to be more iterative, people will most likely stick with the official firmware and this is the question mark. We'll see 🙂
c
Ah - yeah I guess I have that same question mark then: "How 'usable' does Turing Machines intend to keep it while it's under construction?"
The userland might be redone from scratch, come to think of it, while the boot configuration, kernel, drivers, etc. are updated gradually.
t
as a mere user who might submit opinions or ideas, I too was surprised by this apparent change; I thought the community firmware was going to be a totally community production and the official Rust-based version would supercede it upon it's own completion
I guess it was an assumption
c
It does look like they intend to rip-and-replace all C with Rust
(Though, PR #2 makes me think the replacement will happen in stages.)
t
then I'm confused if the rip-and-replace part basically signals a hold on the community edition development while new API's are solidified?
in order to eliminate duplication of effort
e.g. faster transition to the official Rust-based version
or am I misunderstanding something?
c
I don't really know the plan myself either. I'm just a guy who stumbled on their official repo and started cleaning up around the edges and found that they're quite receptive to those changes.
t
I guess my questions are potentially rhetorical... 😉
c
It may be that the word "firmware" is kind of an overloaded term here, because it can refer either to the literal image on the NAND flash that boots up when you power up your board (including a bootloader, kernel, init system, ...), or it can refer to the "bmc" daemon that hosts a webserver (+ other interfaces???) and dispatches those requests -- i.e., strictly the userland.
And each time they've said "new BMC firmware" the features they talk about are more related to the latter than the former.
t
I guess I was thinking of the bmc daemon in this case
c
Ah yeah. I have no idea what the plan with the bmc daemon is. I see they're working on adding a Rust library that can be called from the existing C code, and the intent seems to be to implement flash-over-network support before they have the new Rust daemon yet (and they want to write the reflash code in Rust so they can reuse it in the new daemon).
Personally if I were doing this, I'd recode the entire
bmc
binary in Rust (there isn't much there, should take a weekend), put the current API in
api_legacy.rs
or similar, and then start working on
api_v1.rs
- but it seems there's a desire to do it differently.
t
I was curious if the prototypes/mockups of the web interface posted earlier might be implemented in the CE version sooner rather than later; the new repo signifies a delay to start merging them... at least from my perspective
d
I want to implement them as soon as possible
c
Haha, hence my remark that "this is such a hot mess I swear"
d
In a few days I'll write down all of the feature ideas and bugs as GH issues and shape v0.2.0 which I want to include a new GUI
I have just had a moment of doubt, but I think continuing what I started here is a good idea still and then we'll see where it gets us
t
and after that you'll reevaluate waiting until this is integrated into the new/official version?
c
Since CE is meant to be temporary, it might also be good to start thinking about what both versions (new/CE) need to have to facilitate easy migration onto the new one once it's ready
d
I won't be waiting for the official version of the firmware. At the end the it's the users who will tell us if our effort here is worth it 🙂
t
seems sensible, only issue is easy migration is hard to plan for if we don't know what we'll be migrating to
I meant after the to-do features on the CE GH are done
c
Well, one might be a request that the new firmware have (at least temporarily) a backwards-compatible implementation of the legacy API.
d
The CE is meant to be temporary unless people will want to continue using it and maybe helping out to make it better. The flashing process will always let people to flash the official one, I'm not going to break this.
c
I wouldn't expect it any other way - but simply being able to wipe and start over back with the official firmware isn't necessarily a "migration path." I was thinking something more along the lines of a mechanism to keep settings interchangeable.
d
I'm not sure what and where is going to be integrated into the official version. If anything. The CE will be living own life for now and then we'll see what's next. The CE version should not wait for the official version on anything since this is not the point of why it exists
t
agreed
d
This is a good idea and we might provide a way to migrate settings to the format that the official firmware will use, but for now we only can wait and see the development progress.
t
I guess I let my anxiety get the best of me
c
That, or we start thinking about a common format that both editions of the firmware can use. Perhaps something stored in the EEPROM so it's kept even across a full NAND reflash?
t
that suggests a common specification needs to be drawn up for use by both editions... 😉
d
What causes your anxiety here?
I'm not even sure if the CE is going to be recognized to the point the dev (or teh devs) will put any time into making anything common (since I read the CE should become a part of the new official firmware)
t
reading about a piece by piece transition, that the new one was still in planning, implied that the more work was done on the CE, the more would have to be reworked for integration into the official one... got my mind racing on if that would basically halt work on the CE
d
So far, I was against touching the EEPROM to not cause any troubles with the future official firmware. My idea was to put the root onto the SD card so the config file(s) could be transitioned across the boards and the BMC OS upgrade was thought more like you update the modern Linux-based systems (without a need of flashing of anything, just run an update command)
t
I support continuing on CE, as @DhanOS (Daniel Kukiela) stated...
d
> if that would basically halt work on the CE This is also kind of my fear
t
wants more pretty features and web UI ;-)
c
Well, I mean, using the EEPROM in the same format that the new firmware's design (eventually) specifies.
(If it does specify using the EEPROM.)
t
yeah, too many assumptions being made at this point
c
What I'm saying in general is we can watch the trajectory of both CE/new repositories with an eye toward making sure they remain "compatible," whatever that ends up meaning.
d
But this would also meant that the CE will always be behind the official firmware which is in the opposite what I want the CE to be)
c
Would it?
d
Hard to say for sure, but I guess so
Whichever planned feature will either need to wait for the official firmware or do things "partially" on it's own
c
I've seen firmware replacements for devices that use large EEPROMs to store config; the official firmware used a
key=value\0key=value\0
format, so the community firmware just invented its own keys for the features not available upstream.
That's not to say the TP2 firmware will use k=v, just that it isn't necessarily the case that sticking to the same format limits the CE.
d
We'll see, I guess
c
It sounds like most of the community wants an open process for the new firmware's development. I'm one of them -- would love that to be a "we'll strive for it" rather than "we'll see I guess" 🙂
d
What do you mean by "new firmware"?
c
"The thing that will be factory-shipped on all TP2 boards hopefully by the end of the year"
d
The "we'll see I guess" was strictly about the CE version following the official firmware EEPROM storage. I don't know about the official firmware, but the CE will always be open and community-driven as long as people will be interested in using it and working on it 🙂
c
Yeah - I'm just trying to encourage not taking a fully passive role about the new official firmware either. I don't know if the settings storage is finalized yet, but if not, maybe think about what you'd like for it to have so users can most easily switch back and forth between it and CE
(Also part of why I ran the reorganization commit by you is to keep CE and the new official firmware from diverging too much - you may end up periodically pulling "new firmware" features down into CE from time to time, and I want to make sure that this process remains smooth.)
d
The question is if your PR will be accepted 🙂
c
Sven's initial feedback makes me think it will be - perhaps with some slight modifications.
d
OK, more important question - when 🙂
c
But, still, if the official firmware is being restructured, and CE ends up needing to restructure itself in the same way, is that organized in a way you find to your taste?
d
These changes should be integrated probably before any other work to avoid having to re-do the features added by the commits
I will make a close look to your changes in a few days. I feel like I might have own ideas about some things, but then, if we want to keep it close to the official firmware, I might need to pull this PR anyway. You also seem to know much much more about it than I do.
c
Looking forward to it! (And if you do have your own ideas about these things, please do post about it on my PR. My whole reason for running it by you is so you don't end up blindsided by any "upstream" change you ultimately don't like.)
j
O cool I did know. Is this publicly available somewhere for fun?
d
c
(^ Earlier today I was thinking about how I would love to have this process be more like... 1. Download the .img 2.
dd
it onto a SD card you don't care about 3. Put it in the TP2 board, power it on, wait for the power LED to start steady blinking 4. Hold KEY1 for 5 seconds (power LED starts rapidly blinking) 5. Once the power LED is no longer rapidly blinking, remove the SD card and push the BMC reset switch. )
j
Oh, I never looked at version 1.0.0, thanx
d
Or you can have the root on the SD card and a fallback OS in the Flash (in case of the SD card issues) and you never have a need to flash anything again 🙂
c
Doesn't that sacrifice /mnt/sdcard?
d
Sacrifice in what meaning? What would be the purpose of it otherwise?
c
Storing persistent data that isn't overwritten by a firmware upgrade?
d
Why could not be these data stored there along with the OS which can by updated by a command (like in a modern Linux-based OS-es)?
c
...what is that command?
j
.... like buildroot ?
d
It does not exist yet, I haven't checked the possibilities of adopting any of the existing systems, yet
c
Well, you just answered your own question then.
d
No, I mean to deliver the compiled binaries, more like
apt
with ubuntu
Elaborate, please?
c
Q: Why could not be these data stored there along with the OS which can by updated by a command (like in a modern Linux-based OS-es)? A: [The command] does not exist yet
j
I like the appraoch. buildroot offers opkg integration. But I then would go more into the openwrt direction ...
d
I have a hard to to understand what do you mean still. The idea is to move the root onto the SD card and be able to update the OS by the command. As much as I'm against storing any data on the system disk, there's not much of a choice here, so why not use /var, or /etc (depending on a use case)? Am I missing something?
My ears are open if you want to say more. You both definitely know much more about that than I do
c
So, the idea is to use the microSD as the NAND is used today, and the NAND is there but 100% vestigial (only used in case of microSD failure, as you said)? The current firmware update process just overwrites the partitions blindly. This is nice and reliable (no chance of some stray file the user left behind conflicting with the upgrade, since all stray files are blown away) but you need to keep files you care about out of the rootfs. The current way of doing that is to keep them on a partition of the SD card instead -- but we don't have partitions anymore because boot0 is now sitting where the partition table "should" be, to make the microSD bootable. So now it sounds like to solve this new problem, your solution is to update the rootfs with a package manager rather than overwriting it every time. But this is Buildroot, which is focused on building one-off images, not packages.
j
openwrt is destined for devices of this class. It offers switch, network, and a good set of packages. It has also a firmware building environment intergrated already with only open source tools. It has already a webinterface (which also needs no binary blob). For now just ideas.
c
I suppose we could reverse the roles of microSD and NAND: erase the NAND and put a jffs2 on it, use that for persistent storage, then swap out the microSD if it fails.
j
you could write a uboot bootcmd for this, 1st try to init sunxi_card0_probe and fatload a zimage or execute a uboot.script from sdcard.
d
The CE version is not even using the SD card for anything (yet). It's based on 1.0.0 and 1.0.1 indeed uses the SD card for this simple config file, which is more of a temporary solution IMO. This does not prevent us from making thngs a bit differently.
c
It would be great to figure out how to get a partition on the NAND that can survive whole-system upgrades. Part of that would entail not using LiveSuit for whole-system upgrades. 😂
d
Well, these are at least my ideas. I not necessarily have all the answers yet.
Well, this is why I would see it the way that you only need to flash the Flash the last time and then all of the updates would be made on the SD card only
c
We shouldn't assume the user wants to have a microSD sitting in their board 24/7 though.
d
I made a pool and 100% of the votes were on this idea
Not many votes, though, but 100% of the interested users
c
...any chance the users who don't want to put in a microSD card are also the uninterested users? 😛
d
I don't see why people would need to swap the SD card and for wat
c
I don't know. I'm not the user.
d
I made it clear and among other options one was only Flash and one was Flash + SD card to have more space for the binaries, and 100% was on the latter. And uninterested people won't run CE anyway
c
I mean, not interested in voting in the poll.
You're drawing a total generalization from a pretty small sample size, which is one of the cardinal sins of statistics. The most you can really say is that there are users that do want it, not that all (or even most) want it.
d
Well, they have had a change to voice out their use-case. There are always people who see things differently and would like to have a different set of features
c
Like, the option to boot CE off of a microSD only is actually a great idea.
But if it becomes "CE assumes you have a microSD sitting in there the whole time it's up, and you can't use it for any other purpose" that may turn into a drawback for some
d
I'm trying to create the CE version of the firmware and I'm listening to the interested people. If these uninterested did not bother to vote, how would we know what they want. I'm not generalizing, I'm trying to star the the CE version of the firmware and try to listen to these who have something to say and want to use the CE. I can also guess what people might want or give them the options, but in some cases there's not a lot that can be done and some decisions have to be made
You can't eat the cake and have the cake (or however it was). I've been thinking on the "lite" version of the firmware containing just most important features that can fit into the Flash and the fully-features one with the root on the SD card. It's not like I don't have alternatives. I just presented to you one of my ideas, not all of them (and I have a bunch)
c
I'm just saying, it wouldn't be good to remove the option of writing CE to the NAND.
Hence "We shouldn't assume the user wants to have a microSD sitting in their board 24/7 though."
d
The thing is all the requested features won't fit into the flash at some point.
c
Ah, then having a "lite" build and a "full" build sounds like the way to resolve that.
d
And it might indeed be like - if you want to use another CE version, you need an SD card, which, to be honest, is not a problematic or expensive requirement. But I don't have the final answers yet and like I said you cannot fit all
Then this raises another questions - what to retain into the lite version, since whomever you ask will have own ideas or set of "required" features
c
I suspect that this is probably something that follows the Pareto rule: 80% of the usecases can be achieved with 20% of the storage space
The "lite" is just some subset that achieves that 80-20.
d
Yes, indeed. I mostly meant what we are talking about here now that this would not be a good idea or that would not be a good idea, and the goal is to find out what the people want... in general. And here we are back at these "uninterested' (and yes, we can partially make our best guesses on what to include too)
There's so much to think of and we only barely started the CE 🙂
c
If you get aggressive about it, you'd be surprised what you can cram into a some-number-of-MBs NAND though
d
I'm aware. I'm not sure how much space did you free up, but there's potentially more
c
Space I freed up?
d
One of the other ideas I have is maybe to deliver features as the modules, so you can swap them from the SD card or maybe even online?
You re-structured the repo in your PR. I figured you probably cleaned up the filesystem a bit too
c
No, I didn't touch anything that would affect the output binary.
I was referring more to like, using squashfs for the root + a jffs2 overlay for any files the user changes (OpenWrt does this), turning on LTO +
-Os
optimizations for all built binaries, using a very lightweight libc, ...
Heavy reliance on busybox (which we already have) is a good one too.
d
I've been thinking of optimization flags too, also the bmc binary is quite fat, but jffs2 overlay is something new, hmmm
c
How fat are we talking?
d
Nevermind, the memory did not serve me well on this one. The RAM usage is quite high for this process (yet another thing to work on), not the disk usage
c
Ah. Still, with a Rust-based
bmc
daemon on the horizon, we should probably expect that binary to take up a few MBs.
There's tricks you can do to get even big Rust binaries much smaller (e.g. replace the panic mechanism with a silent abort instead of helpfully-formatted tracebacks), but sometimes the cost is too great.
d
Currently it's 45MB with another 47MB used by the adbd. I remember we could probably save quite a bit of the disk space by getting rid of the swupdate too (but that's be not really an option, I forgot the context of one of the previous talks about it)
c
Wow, 47MB used by adbd? Is that RAM usage?
d
Yes
Copy code
Mem: 80264K used, 31216K free, 108K shrd, 0K buff, 6880K cached
CPU:   0% usr   2% sys   0% nic  97% idle   0% io   0% irq   0% sirq
Load average: 0.06 0.07 0.01 3/68 29135
  PID  PPID USER     STAT   VSZ %VSZ %CPU COMMAND
 1100     1 root     S     2620   2%   0% {S11adb_server} /bin/sh /etc/init.d/S11adb_server start
28715 26830 root     R     2740   2%   0% top
26813  1179 root     S     5860   5%   0% sshd: root@pts/1
   15     2 root     SW       0   0%   0% [ksoftirqd/1]
19906     2 root     IW       0   0%   0% [kworker/u4:2-ev]
28113     2 root     IW       0   0%   0% [kworker/u4:0-ev]
 1092     1 root     S    47288  42%   0% adbd
 1186     1 root     S    45576  41%   0% bmctest
 1009     1 root     S    14032  13%   0% /sbin/udevd -d
27257  1179 root     S     5996   5%   0% sshd: root@notty
20367  1179 root     S     5980   5%   0% sshd: root@pts/0
 1179     1 root     S     5600   5%   0% sshd: /usr/sbin/sshd [listener] 0 of 10-100 startups
 1184     1 root     S     2700   2%   0% /usr/sbin/collectd
27268 27257 root     S     2620   2%   0% -sh
26830 26813 root     S     2620   2%   0% -sh
  994     1 root     S     2620   2%   0% /sbin/syslogd -n
  998     1 root     S     2620   2%   0% /sbin/klogd -n
 1171     1 root     S     2620   2%   0% udhcpc -b -R -O search -p /var/run/udhcpc.eth0.pid -i eth0 -x hostname:turing
20381 20367 root     S     2620   2%   0% -sh
 1189     1 root     S     2620   2%   0% /sbin/getty -L ttyS3 115200 vt100
29135  1100 root     S     2488   2%   0% sleep 1
 1111     1 root     S     1828   2%   0% /usr/sbin/rpcbind
    1     0 root     S     1376   1%   0% init [3]
  924     2 root     SW       0   0%   0% [cec thread]
    9     2 root     SW       0   0%   0% [ksoftirqd/0]
   10     2 root     IW       0   0%   0% [rcu_preempt]
  923     2 root     SW       0   0%   0% [hdmi proc]
   32     2 root     IW       0   0%   0% [kworker/0:1-eve]
   33     2 root     IW       0   0%   0% [kworker/1:1-eve]
12946     2 root     IW       0   0%   0% [kworker/0:2-eve]
   14     2 root     SW       0   0%   0% [migration/1]
   11     2 root     SW       0   0%   0% [migration/0]
  632     2 root     SW       0   0%   0% [spi0]
  961     2 root     SW       0   0%   0% [ubifs_bgt0_5]
    2     0 root     SW       0   0%   0% [kthreadd]
   18     2 root     SW       0   0%   0% [kdevtmpfs]
  948     2 root     SW       0   0%   0% [ubi_bgt0d]
    3     2 root     IW<      0   0%   0% [rcu_gp]
    4     2 root     IW<      0   0%   0% [rcu_par_gp]
    6     2 root     IW<      0   0%   0% [kworker/0:0H]
    8     2 root     IW<      0   0%   0% [mm_percpu_wq]
   12     2 root     SW       0   0%   0% [cpuhp/0]
   13     2 root     SW       0   0%   0% [cpuhp/1]
   17     2 root     IW<      0   0%   0% [kworker/1:0H]
   21     2 root     SW       0   0%   0% [rcu_tasks_kthre]
  371     2 root     SW       0   0%   0% [oom_reaper]
  372     2 root     IW<      0   0%   0% [writeback]
  387     2 root     IW<      0   0%   0% [kblockd]
  523     2 root     SW       0   0%   0% [ion_system_heap]
  537     2 root     SW       0   0%   0% [watchdogd]
  642     2 root     SW       0   0%   0% [kswapd0]
  759     2 root     SW       0   0%   0% [vsync proc 0]
  760     2 root     SW       0   0%   0% [vsync proc 1]
  833     2 root     IW<      0   0%   0% [uas]
  871     2 root     SW       0   0%   0% [rc0]
  889     2 root     SW       0   0%   0% [irq/42-mmc0]
  891     2 root     SW       0   0%   0% [irq/202-4020000]
  910     2 root     IW<      0   0%   0% [ipv6_addrconf]
  953     2 root     IW<      0   0%   0% [goodix_wq]
 1130     2 root     IW       0   0%   0% [kworker/1:2-rcu]
c
There's definitely stuff that can be done about that.
Off the top of my head the first thing I might go for would be a way to launch adbd on-demand.
d
Yup, once you upload the file via the web interface, this should also start the process and start flashing (then a restart occurs, so you do not have to stop ip)
This is also something that that will be done, but there's one issue
What if we free up the RAM off of this process and use it for something else and then we want to start the process to upgrade the firmware (the whole ubi0_5 or ubi0_6 partition, plus the other 1 (out of 9?) that's being flashed with this), We'll have to think about this too and possible stop some services and then run the adbd
I have all of this in my head already 😛 Sometimes I think I put too much thinking into it already instead of starting working on it, but from the other side the more ideas, the clearer the direction is 🙂
c
I'm sure the idea of reimplementing the bare minimum adbd functionality in a more lightweight fashion has already occurred to you?
d
Kind of. I brief idea - yes, but nothing more for now. I did not do any research if any alterative could be used. This was actually one of the first things that came to my mind when I saw how much of the memory it is using.
Another thing is to check what exactly consumes this much of the memory in the bmc process. I guess a lot of that could potentially be reclaimed. I'm almost sure some memory chunks can potentially be allocated and never used or allocated in advance instead of more dynamic memory allocation when possible.
c
Judging only from how the C is written... I am very sure there's lots of low-hanging fruit memory cleanup to be done in the bmc process.
s
Hi All, Im that guy who forked the legacy wenyi0421 repo to the turing-machines space. Im working for Turing Machines, in particular i will be involved with the BMC firmware, probably going to say too much here.. but everything for the best firmware possible 🙂 We are very much interested and committed to build a strong community around the new firmware. So im open to input and would very much like everybody to get involved. Decided is to build upon the firmware which we have right now, rather then to start from scratch, for a couple of reasons: * i dont see us using something else then a buildroot/linux kernel setup. * i dont want to be under water for months developing and stabilizing not knowing when i will be on the other side. * we have limited resources * i do intent to accept and make fundamental changes, but want to keep the mainline shippable. Yes, the current firmware is in a rough shape, but i think with a little love, we will have it patched up in no time 🙂 The first goal is to rebuild the daemons in the firmware. We will build them next to the legacy ones. Once they are feature complete we will decide if its worth to continue to support the old API, or that we can unplug them to make space. To be more specific, we are intending to build a gRPC daemon, which will supersede the bmc application. The work @DhanOS (Daniel Kukiela) is doing with the "CE" version is very cool to see, i hope we can in the future consolidate some of the work into our repo. Work on the kernel image is much appreciated as well, my knowledge here is, well .. kinda rusty I hope that answered some of the confusion, feel free to message me if you have any questions.
c
I'm more than happy to take care of kernel/image/driver/bootloader things and leave userspace up to you and others. In particular I should say that my design philosophy is to have Linux manage the devices as much as it can, freeing up userspace to concern itself with the high-level. For example, userspace should not "set GPIO #99 high" it should "tell Linux to turn on the 'atxpwr' device" - I believe this results in easier debugging, clearer code, and allows hardware revisions to change the design with only an updated DT (no source code changes needed). I'm also offering to rewrite the whole current bmc daemon as-is from C to Rust if you'd like, then we can work on phasing out the legacy API. It should take me a weekend to get working, plus another day or two to clean up. It's up to you whether we migrate to Rust incrementally or all at once. My big weaknesses are in API design, web development, very thorough testing, and sometimes I spin my wheels if I'm given too large or too vague of a task. Beyond that though, feel free to throw any work or problem you need solved my way. I'm very eager to bring this new design to life! 😀
w
Definitely a big fan of the pragmatic "hey maybe let's not just throw out a big old pile of working software for the sake of 'Rust'" approach, glad to see some movement!
s
I tend to agree with your philosophy, i think in terms of power regulation and node switching your definitely right. Let me get back to you to see how we could get this collaboration going 💪
w
If it was up to me..... you're hired! 🙂
t
Installing and uninstalling a microSD for the BMC isn’t a trivial process; it involves uninstalling the whole mainboard, so I at least plan on leaving a card there all the time. I don’t know about others. (Actually had a 128GB one I bought some time ago on clearance for no particular reason.) 😉
c
I have mine pretty accessible and may or may not use it as a sort of "gold master" eMMC, so I can quickly reimage all of my nodes by swapping out the microSD card.
w
Ditto for mine. And I only put a 32GB in there because that was what I had to hand when assembling the system.
t
I know some folks have their Turing Pi 2 installed in a case where they can remove the rear panel to access the bottom for M.2/microSD access, but mine is in a SuperMicro CSE-512 1U case, so I'm not so lucky.
oops, different case, (was thinking of my TP1), but still a 2U so I can't access the bottom without removing the board
how big a microSD for the BMC are we expecting for a "nominal" size?
my CM4's only have 32GB ones
c
My case is a modular stacking thing that I 3D printed, so to access the microSD I just pop the middle layer open
t
sounds neat... and convenient
c
I have a photo of it in #754950670645329920
t
I have a 42U rack chassis in my living room (small apt) so I want to make use of it
c
42U rack filled with passively-cooled things is the dream 😍
d
I want to put my boards into a rack case at some point too. For now I would not be able to do anything of what I'm doing if I do so 😄
Mine will not have passively-cooled things, but I can control the heat and put it out during the summer and keep inside during the winter (well, I will be in ~2months when i finally finish the renovations) 😄
t
@DhanOS (Daniel Kukiela) knows what I've been working on for my TP1
d
Yeah, and the progress is quite nice 🙂
t
my 3D parts were shipped off today!
sorry, topic drifting
w
I spent like half an hour trying to get it in, my wife got it straight away. Go figure
t
it it intended for the BMC microSD to be removed to be written to, or should it be remotely writable through the BMC
w
I mean it is as a matter of fact remotely writable through the BMC
Btw would be neat to have the firmware put /mnt/sdcard/bin in PATH if it exists.
I manually edited /etc/profile but that will be blown away if I flash it
I was trying to figure out if there was a good extra bundle to install to the SD card with full versions of all the usual utilities that live in busybox
find, file, GNU tar, curl and wget with TLS support and a set of root certificates, etc
I suppose GNU coreutils is a start
t
could you recommend a nominal SD card size? I suspect the 128GB one I was thinking of using would be a bit too big
or unnecessary
d
It depends on what are you going to use it for. Once flashing from the BMC is possible, you might be storing several OS images on the card too, some can be multiple-GB big
t
I suppose that depends, conversely, upon what the BMC ends up being capable of
w
Idk how long is a piece of string? If you want to just have extra storage for binaries and stuff, even 4G is probably tons. If you want to flash from it, at least 16G but probably more
I found that 32G was about as cheap as SD cards get and seemed plenty big
t
lol
was just looking on Amazon and 64GB is as low as I can find atm for the @geerlingguy -approved Samsung EVO+
w
I would not overthink it. The BMC SD card is not a highly demanding application. The most demanding thing you'll do with it is flash a module, and you're not going to do that every day
t
sorry, I'm a perfectionist and a little OCD
d
I said a similar thing in a thread in #754950670175436848
But I also understand the perfectionism and OCD - have both, too
t
I mostly don't want to remove the board to replace it if it gets corrupted, wears out, etc.
w
Think about this as perfecting your efficient allocation of resources to areas that need it. It's inefficient to spend extra money on performance that isn't necessary.
t
true
w
Though I understand that whether or not a diagnosed condition is involved, it's not that simple of a mindset shift
t
I turn most things into an intense problem-solving operation; it's kind-of fun
d
The odd thing about perfectionism is that if you (or at least me) happen to have different SD cards, I would not hesitate to use any SD card, but while shopping for one, a different set of the mind turns on and even if you know you don't need the performance, you kind of feel this is the way
t
yeah
d
What helps me with my perfectionism (and OCD as a part of it) is to have someone to push me. For example I felt ill and very uncomfortable while releasing a project to the public that I knew it was far from being perfect. I would never release it in the state I was pushed to release it. And this was actually something I've done for work so I had to release it anyway. I feared what people say and... no one said anything. This experience helped me and I kind of can release incomplete and not perfect things projects now, but I'm not fully over my perfectionism and it sometimes hits hard still
c
I tended to be shy with my code when I first got into software dev for similar reasons.
"Sure I'll release binaries but *no way am I sharing my source aaaa*"
d
Yeah, this kind of things too 🙂
w
We're way off the reservation, but it stands to reason that if you can be pushed to fear embarrassment by a negative experience, you can be pushed to feel okay about it by a string of positive ones. A kind of benign corollary to normalisation of deviance
d
It's not like that person just pushed me. We've had a few talks, they did a good job convincing me, making arguments and keeping it all positive. But it does not work this way. I felt what I felt anyway
u
Hi, I tried to build the TuringPi CE image but the image has ~70Mb in the end and is not working 😕 I also get an error creating the MBR, I guess somehow the rootfs is too big. But I did not change the config files. Any idea where to look at?
Copy code
-----------------update_mbr--------------------------
mbr count = 4

partitation file Path=/turing-pi-2-community-edition-firmware/buildroot/output/images/sys_partition.bin
mbr_name file Path=/turing-pi-2-community-edition-firmware/buildroot/output/images/sunxi_mbr.fex
download_name file Path=/turing-pi-2-community-edition-firmware/buildroot/output/images/dlinfo.fex

mbr size = 16384
mbr magic softw411
disk name=boot-resource
disk name=env
disk name=env-redund
disk name=boot
disk name=rootfs
ERROR: dl file rootfs-ubifs.fex size too large
ERROR: filename = rootfs-ubifs.fex 
ERROR: dl_file_size = 67536 sector
ERROR: part_size = 65536 sector
update_for_part_info -1
ERROR: update mbr file fail
d
I haven't seen such thing. Which OS are you using to build the image? I'm using Ubuntu 22.04
u
Ubuntu 22.04 amd64 (in Docker on a M1 Mac) but I was able to build wenyi0421/turing-pi with that.
d
The CE is pretty much the same firmware, it's forked from the original firmware with just a few things changed like SSH root logins enables or static MAC. If you were able to build the original firmware, you should be able to build the CE
u
Ok, then I think something went terribly wrong during the build 😆
t
I'm curious when the next CE image is likely to land on github. I'm still rocking the factory installed one with some mods that will probably get wiped away with the upgrade (MAC address, init.d script for custom boot order/hw clock sync, etc) and had been waiting for the next build to make the leap. If it's only days or weeks away I can wait patiently, but if a slightly longer time frame I might as well bite the bullet and do the upgrade now before I forget the little tweaks I've made.
d
Currently, I'm watching on the progress in the new firmware, mostly there's some big PR that is either going to be accepted or not. It'd be good to keep the CE close to the official firmware for multiple reasons. There's not much point in re-doing some or most of the changes is this PR is going to be accepted.
t
excitedly waits
s
@DhanOS (Daniel Kukiela) i did already multiple test runs with the new DTS PR. It will be accepted, my only todo is to check the diff with the old version 👍
d
Oh, ok, thank you for letting me know! Since I forked from the original firmware, if the PR will land there, I could just easily import the changes
s
The restructuring is already in master, even though i closed the PR. I guess i had a too long of vacation and forgot how to fastforward a PR properly. I rebased it instead
d
But you mean the tpi fork?
s
Yes, i might help you to push it upstream. But im not sure what the expected lifetime of this repo will be 🧐 tbc
c
I'm probably going to add the OTG 5V GPIO and do some more cleanup before I take it off draft
s
*might be able
Im intending to enable your gpio keys as well, but need to write some rust first to support it 😀
c
Oh exciting, fewer GPIOs to manage from userspace is always worth celebrating!
d
This all really makes me wonder about the future of the CE firmware. You guys know much more about that than I do and while I can do a lot here, definitely not at your level (like the changes that @CFSworks is adding at the lower level). This topic is full of ideas, feature ideas, questions about various fixes. I'm starting considering running this project off of the new official firmware fork, but the problem for me is I don't know Rust and even if I used many different languages in my life so far (starting with BASIC ), each takes a bit to learn it well enough and it won't be much different for Rust. I really have to re-think the CE and what it would/could/should be. I guess not all of the ideas will make their way to the official firmware and this is where the CE might be a solution, but it would need to be close to the official one and pull from it regularly.
c
Hey you started with BASIC too! I learned on Interplay's BASIC dialect back in the late 90s and moved on to QBASIC then VB ([shudder]) so it's cool to see someone else with the same beginnings. 😄
t
I started with Applesoft and Integer BASIC on the Apple II and then MS’s Basic Interpreter on the Macintosh ( drifting to #754950670645329920 😉 )
d
Yes, I started in 1990 (when I was 8). I've had an Atari 800 XL and the Internet was not a thing yet. I've been learning from magazines (Bajtek in Poland). The I used QBASIC too and remember struggling with Turbo Pascal initially (because of lack of the concept of line numbers 😛 )
c
Anyway: My approach is to make the "OS" of the BMC firmware behave in a simple and intuitive way (if I had my way, the USB MUX would be configurable with like...
echo node3 > /sys/.../endpoint1; echo port > /sys/.../endpoint2
though that might involve writing a driver) and I very much want not to touch userspace, so there should still be plenty of room for enhancements there.
s
I hope the official firmware becomes the community edition. As in, it is there to serve the community. Even though I encourage you to go for your own projects and experimentations. 😉
c
^ My hope is also that the effort around the CE project soon gets redirected to the new official firmware, though I am also predicting that the official firmware will be re-forked into a new "CE" repo that contains stuff not appropriate for upstream.
d
This is indeed more of how I am starting to see this
c
One of the reasons I think the latter is because there's talk of adding a bunch of functionality that won't fit in the NAND and make it boot off of a microSD instead. I'm pretty sure TMs wants the official stuff to fit in NAND so it can be shipped from the factory, but a community project to put all of the extra stuff in microSD sounds great for many who prefer that.
d
And in this case the more it should fork from and stay close to the official repo
s
That doesnt have to mean it need to be forked. E.g. you can decide to let the official repo build several build targets or images for that matter
d
My idea around this is to move the root partition onto the SD card and have to never flash anything again
c
Ah, so a "core"/"lite" target (which is what fits in NAND) and an "extended"/"external" target (which only produces microSD images)?
Speaking real quick of NAND, before I forget: @svenrademakers could you do a test build and flash with the kernel config option for "simulate multiplane" turned off? I can get into the rationale in a GitHub issue but since it affects the on-NAND storage layout in kind of a dumb way, I'd like to get that turned off sooner rather than later.
CONFIG_AW_SPINAND_SIMULATE_MULTIPLANE
s
Except your sdcard :p? But if you want your rootfs on the sdcard then your right I was more thinking of extending your nand with the sdcard. I have a similar setup on my openwrt router
d
What exactly do you mean by extending? And why not to put the whole root onto the SD card in this case?
c
@DhanOS (Daniel Kukiela)'s idea is to have 100% of the OS on microSD, nothing on NAND (except a backup OS install perhaps), and manage the installation like a conventional desktop Linux install w/ a package manager
Probably "extending" means that there's either an overlay or a strategically-placed mountpoint so that additional files added to your BMC are placed on the microSD and they simply "go away" if the microSD is unplugged
s
Yes i got this. The reason why i would have the os on the nand is that i dont have to reflash my board when it comes out of the factory. The package manager is running on the sdcard
Yep, so forinstance i think on my router everything on my usb drive gets mounted into /opt and the package manager is configured to install into that
d
Wouldn't you need to re-flash it on the firmware update anyway?
And yes, this type of extending is also a possibility. You can add paths to PATH, etc. But that would not make the updates easier and I think people would like to not have to flash the firmware anymore. If the card breaks, you put another in and you're all set again, but this is the only moment you'd have to flash anything (unless we find a way to be able to initialize the card from the firmware)
Someone even proposed to use the remaining flash for the permanent storage (like for the config files), which also makes sense
s
I think both has it pros and cons,
d
Yeah, I agree
I can also see how your idea better first the goals you need to reach
c
To cite OpenWrt as an example for how they do things: a flash does erase and recreate the whole squashfs+overlay, but since it keeps all of its configuration in
/etc/config
, the upgrade process will grab that (+ other files that you have requested it not blow away) and move them over for you. (Or you can check a box to ask it not to do that and get a fresh install.)
s
So why do you like to avoid flashing?. Lets say we had a perfect ota update system which you could flash the latest kernel with a press of a button? Would that take your concerns away?
c
My only concern here is I want to be able to upgrade/reboot/flash the BMC without it cutting power to my running nodes
d
It's not like I'm fixed on a single idea. I thought of putting the root onto the SD card, have the other "lite" version on the flash and keep, for example,
/etc
on the flash as well
People seem to have some issues with the flash, like bad blocks, etc
The SD card can be easily swapped if it wears out
I see the flash like a thing that should not be updated unless strictly necessary to avoid issues
s
Fair point
d
We can go even a step further
The BMC can be booted off of the SD card entirely
c
(Raspberry Pi style)
d
The CE might go this route and have a full partition layout and not even require the flash to be in a working condition
I just don't know 😄
I kind of feel like the flash would do great for the most users and the tinkerers would go more like the SD-card way
s
Well having the option to do so would def be awesome 😎
d
Maybe the official firmware could be made this way?
s
Im all open to it
d
Like even if someone's flash dies, they can grab an SD card and continue using their board again in minutes?
c
I can't speak for TMs but I'm predicting what they want (eventually) is a smooth out-of-the-box experience: pull the TP2 out of the antistatic plastic bag, plug in a PSU and Ethernet, boom you have a web interface.
Imaging a microSD isn't too much out of the way (after all, the target market for the TP2 is Raspberry Pi users who are used to that), but it may still be an added step that isn't very desirable
d
This is why you'd have 2 or 3 versions of the official firmware: - lite on the flash (the default) - lite on the SD card - fully-featured on the SD card
s
So we have already 3 versions: - lite. factory standard - recovery. Factory tools and utils - dev edition. Full sdcard image
c
LOL
s
😂
d
😉
c
Well you agree on A and C but your Bs differ slightly
d
A dev edition is a thing already? Someone has prepared something that boots off of the SD card?
c
For what it's worth I view the lite and recovery as the same "version" -- the recovery is just the "installer" for the lite. (i.e. both images would be built when BR2 is set to that config target)
s
To make it more crazy, you could let the user decide to upgrade a lite version itself into a full sdversion by checking if there is a sdcard mounted. It would pull the latest sdcard image from the internet and flashes it
d
And copy the settings over from the flash
This is kind of the direction I like
c
Yeah, that's a smooth out-of-box experience, but also gets the NAND flash unused beyond that very first boot.
d
Flash could be used as a permanent settings storage (in this case)
And will let you flash a new card if one you have dies
c
Yeah, we can put a UBI partition that only stores configs
d
And the flash still will be used by most of the users
SD card is for these who want more
c
I'm definitely in the NAND camp. Much as I'm excited about BMC firmware dev, I am not going to be using the BMC for much haha
d
After this discussion I think I want the CE to fork from the official repo and I might even start learning Rust. I most likely see the CE pulling from the official repo and adding more and even pushng back to the official repo through the PRs
c
@DhanOS (Daniel Kukiela) I think even if you ultimately decide you don't like Rust, it's still good to learn. It has a few restrictions (the "mutable references are exclusive" one leaps to mind) but I found learning to work with the restrictions made me a better programmer.
d
I went through many programming languages. Each has own restrictions and flaws and you either work with them in mind or around them 🙂
c
I'm gonna head out and relax my brain a bit for the evening after fighting with the weird NAND layout imposed by "simulate multiplane" but I'll get caught up again later. Bye all! 👋
s
Cheers!
d
o/
w
So excited to see this land
The future of "my board isn't booting, let me quickly mount the emmc on BMC and poke around, see if I can't figure it out" is calling me
t
Thanks! I'll push on with an update next time the kiddo is asleep and I'm chore free.
c
As a stopgap until TMs starts assigning permanent MAC addresses to the board, it might be good to randomize the temporary MAC address deterministically from the sunxi serial number. For inspiration, here is how U-Boot itself does it:
w
nice
s
@DhanOS (Daniel Kukiela) @CFSworks i created 2 placeholders after our discussion of last friday. Im a bit exhausted typing, but I would very much like some feedback: https://github.com/turing-machines/turing-pi/issues/34 and https://github.com/turing-machines/turing-pi/issues/35
d
About #34, I thought we agreed that it'd be more convenient to prepare a standalone SD card image (that the BMC will boot off of instead of the flash) since that'd be a base for the CE and also a solution for people with broken flash chips.
c
If we're willing to be slightly more ambitious, I think it might be a cool feature if the NAND edition of the firmware could (upon the user's say-so) move itself to the SD card, with all files and settings intact.
s
If i remember your biggest argument for having this was so that you could easily do firmware updates. i.e. just flash another sdcard? but im not sure if we fully reached a decision here now that i think of it. - with an overlay, you could actually very easy update the base image, without losing your work on the sd? -having it as an overlay would be more maintenance friendly. as the configurations are closer to each other. (im countemplating to have a overlayfs on the flash on default as well) - im not convinced, or i surely dont hope so, that the amount of users with broken flash is somewhere close to significant
sounds like a very big stretch for now 😛
mmhh maybe im talking bs now. i need a break my brain is fried 🙂
d
I've seen multiple people reporting bad flash blocks, there were a few RMAs because of this too. Having an overlay would not necessarily make updates easier since you'll have to update binaries and re-flash the flash. The idea was a full OS on the SD card that you can update like regular Linux systems. CFSworks then added (and also repeated minutes ago) to that an ability to copy the flash onto the SD card for even easier start. Then, the idea was to use flash as the permanent storage for the configuration files (in case the SD card break or you want to replace it) There were a few more ideas and I really liked them and the idea to create an SD card base image that the CE can extend
c
On that latter point, we do need to invest some energy into turning off
SIMULATE_MULTIPLANE
, because it changes the flash geometry from 1024 blocks of 128KiB to 512 blocks of 256KiB, which is frustrating UBI's ability to wear-level and doubling up the erase operations.
d
I thought the last point from your list was your agreement 😄
s
thanks for re-iterating 😛 its coming back now
d
👍 🙂
c
Also yeah I think the idea is just to use the SD as if it were the NAND flash. Maybe with real partitions instead of mtd+ubi but otherwise the same layout
d
"Real" partitions 😄 But yeah, etx4 (for example) and the layout could or even should be slightly different, for example we would not need 2 copies of the root filesystem (like ubi0_5 and ubi0_6 in the flash)
c
Kernel should probably also be kept in the filesystem rather than its own partition (on both NAND and SD)
s
yes exactly this is where my confusion was. so we would only flash the partition/section with the vanilla firmware on the SD
d
Yes, the same firmware with a slightly different partition layout, but otherwise the same files. So this would serve 3 purposes: - a workaround for broken flash - a dev platform or a platform for these who want to tinker - a base for the CE
The overlay would work nicely to extend the official firmware using the SD card, but that'd be it's only purpose. It won't help these with a broken flash and would not really help with the CE (although I'm not expecting the official firmware to do things specifically for the CE even if it's be nice if both firmwares work together nicely).
c
The "pack up and move to the SD" (which would essentially be the same logic as recovery, just in reverse) mechanism should also take away the one advantage the SD overlay had: it's easy for the user to enable.
d
I'd say this really depends on the implementation. Preparing the SD card for running the whole OS still could be easy as a single click and once you remove the card you're back at the lite version of the firmware. But I, of course, see, how the overlay will be easier to initiate. I also think that most of the people who will use the SD card will be the tinkerers who won't mind slightly less convenient initialization method.
c
True, they could always wget/dd the extended firmware image on the BMC to the SD in-situ
d
Or flash the SD card externally and put into the TPi2
Or even the web interface could have a button to download the image and flash the SD card (using dd or whatever)
The CE most likely would require some manual steps since I don't think the official firmware could have a way to download and flash this version 😉
c
@DhanOS (Daniel Kukiela) Do you think you'd have some time today to try a build with
SIMULATE_MULTIPLANE
turned off (both in kernel config and a header in uboot)? I know I'm sounding like a broken record on this but it's really complicating access to NAND flash and it's likely the first step in simplifying the firmware boot/flash process. I'd look more into it myself but I'm about to go on a trip and I want my board in a stable state so I can hack on it remotely. (That and I can't quickly get either of the Suits working before I leave.)
d
You'd want me to build a firmware with a few changes? The official one?
c
You may also have to erase mtd3 for the update to take. I don't really know what part of the update process turns the rootfs and kernel into ubi volumes or if it'll balk at having that partition suddenly "corrupted" by multiplane being turned off
Well, build and then flash and see how much grief it gives you
d
If you tell me what exactly do you want me to change and where, or if you can send modified repo or a GH link, I can compile it and flash my board with it. Compiling for the first time takes a lot of time since some sources are being downloaded and some of them take hours to download. I could use my local cache, but also some people mentioned they cannot build the CE so re-downloading the sources might also be a good idea since what I get with my cache would not necessarily be what others would get if the sources has changed. I'll leave the decision if to use my cache (which would make compiling much faster) or do not use the cache and re-download the sources, but then I might not have the firmware today.
Either way, yes, I can compile and flash the firmware and see if/how it works and share the images with you
c
With any version of the firmware you like, after doing a complete build, cd to buildroot and: - Do
make linux-menuconfig
, search for (and turn off)
AW_SPINAND_SIMULATE_MULTIPLANE
- Edit
output/build/uboot-69b04a0b3dd5c412f66e9dbfd02876eebfd99646/include/linux/mtd/aw-spinand.h
and find the line
#define SIMULATE_MULTIPLANE (1)
, switching it to
0
- Rebuild both and image:
make linux-rebuild uboot-rebuild target-post-image
- Flash new
buildroot_linux_nand_uart3.img
to NAND on one of your boards - ... see what happens and try to make it work and report back on how difficult the flashing was
That last step is the vague one since I don't know what will happen at that point.
d
I'm guessing img + PhoenixSuit is the only way to flash it? I'm asking not only for the purpose of this test, but also because of how others will flash the firmware with such changes. Could using an SD card to boot from some sort of firmware upgrade image be a thing? People already have too many issues making PhoenixSuit to work, not to mention not everyone has a Windows PC.
I'll run all of this as soon as I can and will test it with one of my boards
c
I'm at this point fully convinced that the official upgrade installer for the new firmware will be the recovery SD 😂
But maybe we can show
sunxi-tools
some love and have a script to do the same for people who can't easily access the SD
But, yes, PhoenixSuit needs to go down in flames -- and stay dead 😉
c
There is an ancient repository https://github.com/jankowskib/FELix but haven't tried
c
Ooh, could be worth a shot. I don't have high hopes though -- the T113 is so different from most of the sunxi chips (it's actually more closely related to the D1 but with an ARM core instead of RISC-V) that none of these older tools are at all likely to work. 😦
Apparently there's a patchset for Linux that implements MTD-in-userspace. If I had my choice of simple tool, it'd just be something that grants me access to the NAND flash over USB, making it available for me to hack on via that mechanism.
d
This is where my knowledge is limited. What one really need to flash the T133-S3? Is the SPI flash (in our case) access all you need or does the processor have some registers you also need to set? If it's "just" the flash, booting from the SD should make it fully accessible, right?
If anything, you could access the SPI interface directly and flash the chip this way. This should not be this hard.
@costa-al This recovery SD card that's been recently added to the repo after I asked for a way to let people with unbootable BMC to, flash their boards, is this something you just got from the chip manufacturer, or are there any sources to compile such image (like the sources for the firmware)?
c
As far as I know if you were to replace (desolder, swap, and resolder) the T113, the only thing that would change is the sunxi_serial. The T113 does contain some eFuses which affect boot but they're mostly concerned with the boot order (SD->NAND->FEL, ...) and I think we use the default ordering anyway. So you really just need SPI working and you can fully reflash the BMC firmware that way. (Heck, you could possibly even get a clip that can attach to the 8-WSON flash chip package and fully reflash that way without even needing to power up the TP2!) The various mtdX devices are also just an arbitrary partitioning decided by the AWNAND driver. It gives 1MiB to boot0, a configurable (via
aw-ubi-spinand.ubootblks=
kernel parameter) number of blocks to U-Boot, 1MiB to "secure_storage" (we don't use it), and the remainder to "sys." (The fixed partitioning is one of the reasons I want to move away from AWNAND to the vanilla Linux driver.) The "sys" mtd is a UBI layer to do wear-leveling and further subpartitioning (which you probably already know about) and then within we of course have the U-Boot env, kernel, and primary/fallback rootfs.
Writing to NAND from SD is very much doable but we'd need to use a SPI-NAND driver with sensible block ordering (i.e. either upstream Linux's
spi-nand
driver, or AWNAND with
SIMULATE_MULTIPLANE
turned off). The boot0 and uboot are stored in the normal "linear" order so if we try to write to those mtds through the AWNAND+SIMULATE_MULTIPLANE driver it will end up all scrambled. (Unless we write it in a scrambled order so that it ends up non-scrambled on flash, but that's even more hideous and I just want to do away with SIMULATE_MULTIPLANE and call it done.)
c
As far as I know, no one has contacted Allwinner about this. It's most likely from public sources
c
What does it do, you put it in and it forces the T113 into FEL (USB recovery) mode?
If so it might just be this:
d
> (Heck, you could possibly even get a clip that can attach to the 8-WSON flash chip package and fully reflash that way without even needing to power up the TP2!) Yeah, that's for sure. 🙂 But that's not a standard part delivered with the TPi2 board 😄 The other part of your message is a bit over my head, but I'll catch up. If the flash could be programmed just by using SPI, that's totally doable, although will require some additional work that the driver is already doing. One way or another I'm sure we can make the flashing working without PhoenixSuit. I have too much on my head and unfinished other things, but I might try to take a look especially that IU have 2 MangoPi boards and 2 spi flash chips
It starts being visible to the PhoenixSuit
c
I spent way more of the weekend than I'd like to admit digging into the T113 boot sequence. I didn't really accomplish what I was trying to do so please feel free to pester me with boot questions so my prior research can be of some use. 😂
d
Oh I will. We have to figure out how to build SD card images and then push this to the official firmware repo so I can base the CE on this 😄
c
Sounds like goal 1 is to get just a "naked" U-Boot up and running off of an SD card. We can do that the super-quick way (using Allwinner's prebuilt boot0 and U-Boot) or the clean way (hunting down the T113 patches for U-Boot, applying them, and coming up with our own configuration ourselves, thus having more full control).
I'm an advocate of the clean way, but also this was pretty much what I spent my weekend trying to do haha
d
Well, me with my perfectionism would also say the clean way would be the way. 😄
c
Well, fortunately this weekend I wasn't aware there were already patches available for U-Boot (I was trying to hack on it myself and quickly got overwhelmed). We just need the latest U-Boot plus these patches: - -
I have a partial .config I was working on too.
d
I have a lot to learn here to understand this 😄
c
The big problem this stuff is solving is that the T113's bootrom (the code inside the T113 chip itself) is not sophisticated enough to bring up the DRAM controller. So instead of a nice 128 MiB of memory, it configures the CPU's cache to act as "RAM" and we only get a few dozen KiBs of memory -- which also puts a limit on how much code the T113 can boot from SD or NAND. The solution: the first thing that is loaded and booted is just a little stub program that brings up the DRAM controller, reconfigures cache to work "normally," does a few other chip init tasks, then loads and executes all of U-Boot (which is much, much bigger). Since its only job is system prep and loading a bigger program, it's called the "SPL" ("secondary program loader") -- though Allwinner seems to like calling it boot0.
So the funny thing is the "HELLO! BOOT0 is starting!" is printed even before DRAM is working.
d
Iiiineteresting. Thank you for sharing this super interesting (at least to me) information at how it works.
c
I have never worked on bootloaders this early in the hardware bring-up either. Usually DRAM, CPU, hardware, etc. are all sensibly initialized before the bootloader gets to run. DRAM bring-up is usually a "BIOS" type job.
d
I just doscovered something
Since I have a Mango Pi, I decided to check it's homepage again: https://mangopi.org/mangopi_mq
And I found this:
Actually, this is not what I thought it is
c
Yeah if we want to do it the quick way, we'd have to grab
boot0_sdcard.fex
, hex edit in some of the settings (proper DRAM timings, console UART=3, location and size of U-Boot), then just
dd
that at offset 8KiB in the SD card.
Allwinner must have a tool out there to fill in the proper boot0 settings and write to SD card but I don't know what it is.
U-Boot has its own SPL that replaces AW's boot0, and what I like about U-Boot's is you can give it all of the settings at compile-time, so it's ready to use as soon as it is built.
...it does look like PhoenixCard is that tool actually.
d
For the record:
Device Drivers -> Memory Technology Device (MTD) support -> sunxi-nand -> enable simulate multiplane
Other than that, just done all the steps and got the image file. I never used the PhoenixSuit so far, so this is what I am going to try now 🙂
c
It'll either work, or U-Boot won't see its env (and Linux will panic on boot), or it'll fail to flash
d
We'll find out soon 🙂
I need to write some instructions on how to use PhoenixSuit, I have had some issues making it work
Anyway, I flashed the board and it does not boot anymore. Time to hook up USB UART and see what's going on
It spits out a lot lot of these:
Copy code
[15.228]Volume identifier header dump:
[15.232]        magic     31181006
[15.234]        version   221
[15.236]        vol_type  139
[15.238]        copy_flag 75
[15.240]        compat    170
[15.242]        vol_id    -871235584
[15.245]        lnum      0
[15.247]        data_size 16777216
[15.250]        used_ebs  -2080309248
[15.252]        data_pad  2013265952
[15.255]        sqnum     4503599660924928
[15.258]        hdr_crc   101011f1
[15.261]Volume identifier header hexdump:
[15.264]hexdump of PEB 850 offset 4096, length 126976[15.283]ubi0 error: check_corruption: PEB 852 contains corrupted VID header, and the data does not contain all 0xFF
And then:
@CFSworks
I guess I need to recover with the recovery SD card now. If you have any other ideas to try, feel free to propose (but I won't try them now)
c
Are you able to interrupt U-Boot? Did it reflash U-Boot at all?
d
At this stage you need to tell me how to check things. I flashed the img, so I guess it did? But I'm not sure how to interrupt U-boot
c
It's either CTRL+C or esc during the "autoboot in 2 seconds"
I don't know which actually. @j0ju's theory is the shipped-on-boards U-Boot doesn't respect the interrupt hotkey but once you reflash through PhoenixSuit/LiveSuit, it installs a version that does respect the hotkey.
d
Ok, I see:
Copy code
Hit any key to stop autoboot:  0
After all these errors
Let me try again
c
Oh 'any key' that's helpful 😄

https://tenor.com/1QlC.gif

d
Ok, I was able to interrupt the u-boot and I have a prompt waiting
c
The
efex
command will get you back to USB recovery mode if you want to recover the board.
Though right now I'm curious why U-Boot apparently is still using the "simulate multiplane" layout while the kernel is not. 🤔
d
It indeed did but I'm not sure what PhoenixSuit started flashing. Windows BSODed, though and I think it's related
I found some more references to "simulate multiplane" so I'll try to re-check them, flip, rebuild the image and try again. But I'm also not 100% sure what I've seen. Either way, I'll try to look at it later. For now it's time to get some sleep
c
Yeah we're pretty much in reconnaissance mode right now: "How much do we need to change to get this stupid flag turned off?" 😂
I don't know the solution.
d
From these few skills I have, I'm also a problem solver. Once you know and shared a way, I might be able to make it working. Maybe, who knows 😄
Well, after this BSOD during firmware upgrade, I'm getting nothing on the UART anymore
c
It's probably still waiting around in
efex
mode
Unless you mean the UART is dead even with resets?
d
I power-cycled the board, but I'll check that
c
mm
d
PhoenixSuit does not see the device
Longer poweroff before power on and I see things on UART
c
I wonder if the BSOD during flashing gave it a corrupt NAND and the recovery SD is now necessary
d
New things, though
And another BSOD
I started seeing familiar output, I guess I could refresh but another BSOD
This is not an coincidence
Previously:
c
Yeah geez is PhoenixSuit doing the formatting and heavy lifting in kernelspace? Why is it BSODing
d
Now:
Driver issue this time
Ok, good news is I left the board powered off for the whole my PC reboot and now it behaves like previously and I can interrupt the u-boot and
exef
works
I won't be flashing it again now to not wear out the flash unnecessarily
I'll try to search more about the "simulate multiplane" in the firmware and flash it with another image
c
Yeah and you might want to try erasing the NAND fully (I don't remember which U-Boot command does this)
d
I think I might want to finally solder the SPI flash socket onto the MangoPi and continue there instead of TPi2 🙂
c
Sounds fair haha
d
From the other side I have 10 flash chips I can replace on the board if it wears out
But it's always easier to repalce it in a socket 🙂
c
btw here is a much smaller version of the same "force to FEL mode" SD image. You flash it with
dd if=fel-sdboot.sunxi of=/dev/mmcblk0 bs=512 seek=16
That means I was wrong here, btw: boot0 does need to be at a fixed location on the microSD, but it starts at sector 16. Sectors 0-15 are free to have whatever partition table you like (and thus you can partition the microSD however you like, you just need to reserve the first 1-4 MB or so for U-Boot). But it does mean once we get U-Boot on an SD, we can start figuring out what partition layout makes sense -- no particular restrictions apply. 🙂
s
amen
d
Will try it out! If it works, we can put it in the official firmware
I guess this is the answer:
Copy code
./buildroot/dl/uboot/git/include/linux/mtd/aw-spinand.h:29:#define SIMULATE_MULTIPLANE (1)
Also, should this be flipped too?
Copy code
./bmc4tpi/config/kernelconfig:983:CONFIG_AW_SPINAND_SIMULATE_MULTIPLANE=y
I'll assume we need to flip both
I don't think the above helped, this same issue
Ok, this one does not matter (I mean it might, but not in this case)
Ok, I ran out of ideas for now. Will take a look at it later
c
So right now I think the 3 things we have to get
SIMULATE_MULTIPLANE
disabled in are: - The Linux kernel itself (that's easy, and we do have it right or it wouldn't be spewing UBI errors) - The boot-time U-Boot, which is packaged into
boot_package.fex
(and before that is
u-boot-sun8iw20p1.bin
) - The flash-time U-Boot, temporarily loaded into the T113 over USB by PhoenixSuit just to do the flashing operations, which is bundled in the .img (and before that is
dragon/u-boot.fex
)
I spent today getting these patches updated and trying a test build of the SPL. Did not work. Since I'm blind as to why, I'm now stepping it (vs. boot0) in an emulator to debug it.
d
I spent some time yesterday looking at this too and found nothing s well
c
Copy code
UART>(2)
Raw UART input
Any key to exit

U-Boot SPL 2023.07-rc2-00024-g5be325423d-dirty (May 10 2023 - 21:15:29 -0600)
DRAM: 128 MiB
sunxi SPL version mismatch: expected 2, got 1
Trying to boot from FEL
U-Boot's SPL works, now I just have to look at getting it to load U-Boot itself.
Looks like it gets RAM working correctly too:
Copy code
$ ./sunxi-fel readl 0x5ABCDEF0
0xaedef7f7
$ ./sunxi-fel writel 0x5ABCDEF0 0xDEADBEEF
$ ./sunxi-fel readl 0x5ABCDEF0
0xdeadbeef
A lot of what I'm doing is also getting U-Boot a little friendlier to building under Clang (my cross-compiler of choice) and finding a few interesting differences vs. the GNU toolchain. One that was causing me some bus errors (note the difference in alignment):
Copy code
$ clang --target=arm-linux-gnueabi -c foo.S -o foo-clang.o
$ objdump -h foo-clang.o
...
Idx Name          Size      VMA       LMA       File off  Algn
...
  1 .text.test    00000008  00000000  00000000  00000034  2**0


$ .../arm-linux-gnueabi-gcc -c foo.S -o foo-gcc.o
$ objdump -h foo-gcc.o
...
Idx Name          Size      VMA       LMA       File off  Algn
...
  3 .text.test    00000008  00000000  00000000  00000034  2**2
Now I'm wondering if there's some flag I should be passing Clang so it sets a default alignment on any CODE section, or if U-Boot is at fault for leaving off the
.align
directive in its assembly files.
^ Everyone following this thread now
w
I've been using dockcross which is pleasantly plug and play
Just had to hack in the correct glibc
c
That reminds me: does everyone have a favorite libc? I feel like glibc might be a little heavy for the BMC.
j
Does that theory still hold? (Note: sorry, I am a bit off right now, work & travel does not play well with private hacking, the turing Pi is a bit large for the luggage, and I do not want to know, what might happen in customs when they find a naked TPi2 with some Pis and wild wires ... 😉
c
Haha I rigged mine up with a Bus Pirate and bound the debug USB port to a USBIP server so I can hack on mine from out of town 😂 But yes it does look like reflashed-uboot is substantially different from factory-uboot. btw my current efforts are on getting vanilla uboot running on these, then I'll start working on the same for vanilla Linux
j
none actually, musl might fit a bit better, BTW openwrt userland plays well on the BMC, althoug the OpenWRT kernel is faraway from working, althoug I ve seen somewhere support for some T113 variants with ARM being added to vanilla 6.2, but I ve seen some work on the MangoPi board
d
If you ever need to test anything on a MangoPi, I have a couple. Just need to solder in the flash chip sockets (I have 10 flash chips exactly same as on the TPi2, I could solder them directly but it'll be easier to swap them with these sockets) (https://discord.com/channels/754950670175436841/1080282784570019942/1101669200117899284)
c
Oh happy day my devicetree rewrite got merged 😄
...though I may end up needing to do a second rewrite if we go from the Allwinner fork of the kernel back to mainline Linux 🤔
w
Very exciting! Thinking about at what point I should look into doing a firmware upgrade
c
I'm making some progress getting upstream U-Boot to start on my board, but this has me baffled:
It just barely comes up and then it begins spamming 0xF0 on the UART
@DhanOS (Daniel Kukiela) here is the .bin if you want to test it out on a microSD in one of your MangoPi boards (
dd if=u-boot-sunxi-with-spl.bin of=/dev/mmcblk0 bs=512 seek=16
)
d
Sure, I'll give it a spin tomorrow!
I'll be happy to help testing things on the MangoPi and the TPi2
c
The above error was because I screwed up the memory layout; this one boots, at least over FEL (have not tested microSD)
Had a chance to try this yet? (It's fine if not, I just don't want to be missing out on the fun) I suspect that U-Boot's MMC driver is somehow messed up (haven't been able to access my microSD when I boot into it) but I don't know if that will affect SPL (i.e. U-Boot's ability to load off of microSD) too.
(Again: I have only gotten into U-Boot over FEL so far, haven't tried microSD loading yet)
d
Not yet, sorry. I've been moving some services between the servers (unrelated to TPi) and things did not go fully according to plans. But I should test it tomorrow. Just thing went in a way.
c
Sounds good. By the way the latest .bin is lacking a few drivers so you probably can't do much with it once you get it up. (My current progress has the Ethernet + MMC drivers working, and I'm working on SPI+NAND right now.) But, confirming that it at least boots to a prompt when you put it on a microSD would be excellent.
Here's the latest, with: - FAT, ext2/ext4, SquashFS support - GPIO buttons, LEDs support (untested, I'm away from my board) - MMC support (able to read files; putting U-Boot itself on SD card still untested) - Boot linux zImage directly (no need for Android's nonsense
mkbootimg
! but untested) - MTD/NAND support (note: partition layout and block ordering very different from current official firmware, use with extreme care) - Ethernet support (can pull kernel/initramfs/dt over TFTP!) - Pull-up register on UART3 is enabled, for compatibility with open-drain serial cables
d
I am going to test it in 2 hours
Do you want me to test the previous one too if you dropepd the new one?
c
Here's my current defconfig and devicetree (mostly sharing for GPL purposes, but also so you can tinker if you want). Tree is a merge of + U-Boot master +
Probably not worth any testing on the older .bins. This newest one should have almost all of the features we want.
If you could test microSD loading, GPIO+LED functionality, and getting the kernel to boot (it needs the
zImage
and .dtb from the buildroot output), that would help tons. I am either unable or too scared to try these remotely (I'm currently 900+ miles away from my board and accessing it via USBIP and a Bus Pirate wired to the UART port 🤣 )
d
I might have multiple questions about how to test the other things you are asking about unless you can write a bit more in advance what to do 🙂 I'm on a steep learning curve with this 🙂
c
Button and LED testing are simple, just play with the
button
and
led
commands in the U-Boot shell
microSD loading is easy, just put it on a microSD with this command and see what the board does when you boot it with the microSD in
Booting the kernel? ... no idea yet, haven't managed to make that happen.
d
Perfect, that's exactly what I needed. I'm not sure if I can manage to boot the kernel due to lack of knowledge, but who knows? 🙂 I'll take a look if I can manage to do this. If this can help, I might even maybe manage to give you some sort of remote access to something?
Like if something stop working I'm here to make it working again 🙂
c
Oh, and
<inactive>
means "hasn't been accessed yet" not "not lit/pressed"
d
I'm figuring out how to hookup UART to this thing. Looks like I need to use these 2 small holes and not the headers
c
Looks like it; though the .bin I sent is for the TP2 itself (might work on the MangoPi though)
d
Ok, I thought you wanted me to check on the MangoPi
TPi2 will be much easier
c
Ah, yeah the first 2 .bins were using the MangoPi devtree, the last one is for TP2
But they should be pretty similar
d
It outputs this:
Copy code
U-Boot SPL 2023.07-rc2-00210-ga88a11bbc4 (May 23 2023 - 12:49:40 -0600)
DRAM: 128 MiB
Trying to boot from MMC1


U-Boot 2023.07-rc2-00210-ga88a11bbc4 (May 23 2023 - 12:49:40 -0600) Allwinner Technology

CPU:   Allwinner R528 (SUN8I)
Model: Turing Machines Turing Pi 2 BMC
DRAM:  128 MiB
sunxi_set_gate: (CLK#24) unhandled
Core:  38 devices, 19 uclasses, devicetree: separate
MMC:   mmc@4020000: 0
Loading Environment from FAT... Unable to read "uboot.env" from mmc0:1...
In:    serial@2500c00
Out:   serial@2500c00
Err:   serial@2500c00
Net:   eth0: ethernet@4500000
Hit any key to stop autoboot:  0
switch to partitions #0, OK
mmc0 is current device
Scanning mmc 0:1...
ethernet@4500000 Waiting for PHY auto negotiation to complete. done
BOOTP broadcast 1
BOOTP broadcast 2
BOOTP broadcast 3
BOOTP broadcast 4
BOOTP broadcast 5
BOOTP broadcast 6
BOOTP broadcast 7
BOOTP broadcast 8
BOOTP broadcast 9
BOOTP broadcast 10
BOOTP broadcast 11
And then:
Copy code
BOOTP broadcast 12
BOOTP broadcast 13
BOOTP broadcast 14
BOOTP broadcast 15
BOOTP broadcast 16
BOOTP broadcast 17

Retry time exceeded; starting again
missing environment variable: pxeuuid
Retrieving file: pxelinux.cfg/01-02-00-fa-db-69-17
*** ERROR: `serverip' not set
Retrieving file: pxelinux.cfg/00000000
*** ERROR: `serverip' not set
Retrieving file: pxelinux.cfg/0000000
*** ERROR: `serverip' not set
Retrieving file: pxelinux.cfg/000000
*** ERROR: `serverip' not set
Retrieving file: pxelinux.cfg/00000
*** ERROR: `serverip' not set
Retrieving file: pxelinux.cfg/0000
*** ERROR: `serverip' not set
Retrieving file: pxelinux.cfg/000
*** ERROR: `serverip' not set
Retrieving file: pxelinux.cfg/00
*** ERROR: `serverip' not set
Retrieving file: pxelinux.cfg/0
*** ERROR: `serverip' not set
Retrieving file: pxelinux.cfg/default-arm-sunxi-sunxi
*** ERROR: `serverip' not set
Retrieving file: pxelinux.cfg/default-arm-sunxi
*** ERROR: `serverip' not set
Retrieving file: pxelinux.cfg/default-arm
*** ERROR: `serverip' not set
Retrieving file: pxelinux.cfg/default
*** ERROR: `serverip' not set
Config file not found
BOOTP broadcast 1
BOOTP broadcast 2
BOOTP broadcast 3
BOOTP broadcast 4
BOOTP broadcast 5
BOOTP broadcast 6
c
Okay sweet, so it's trying to netboot. Try interrupting it during the countdown?
d
Copy code
BOOTP broadcast 7
BOOTP broadcast 8
BOOTP broadcast 9
BOOTP broadcast 10
BOOTP broadcast 11
BOOTP broadcast 12
BOOTP broadcast 13
BOOTP broadcast 14
BOOTP broadcast 15
BOOTP broadcast 16
BOOTP broadcast 17

Retry time exceeded; starting again
=>
c
Oh you got a prompt
?
for help
d
Yup
help printed
c
Test the LEDs and buttons? (Do you have a front panel header connected?)
d
I do not, I'm using bare board, but I can hook up the LEDs
c
key1
is at least easy to test
d
The reset button - isn't it a hardware button?
Yeah
c
The BMC reset is a hardware button yeah, pushing that will force a reset and there's nothing software can do to stop it
d
Copy code
=> button
button - manage buttons

Usage:
button <button_label>   Get button state
button list             Show a list of buttons
=> button list
fp:power        <inactive>
fp:reset        <inactive>
key1            <inactive>
=> button key1
off
=> button key1
on
=>
c
Sweet
d
Let me grab some LEDs and some wires to hook them up
c
So this means we can have the bootloader check key1 and follow a different boot path if it's held at power-on (good for some type of recovery)
d
Yeah, I see how this can be really helpful
c
The shell environment you're in is somewhat bashlike; do
env print
if you want to see the current environment variables (most of which contain scripts that can be run)
The
bootcmd
var is what runs if you let the autoboot timeout run out, and it seems to follow a whole series of calls to hunt for and run a boot script
So in theory if you have a
/boot.scr
or
/boot/boot.scr
on your microSD it will autorun that.
d
Copy code
=> led
led - manage LEDs

Usage:
led <led_label> on|off|toggle   Change LED state
led <led_label> Get LED state
led list                show a list of LEDs
=> led list
fp:sys          <inactive>
fp:reset        <inactive>
=> led fp:sys on
=> led fp:reset on
=>
Yeah, I can see it:
Copy code
boot_prefixes=/ /boot/
boot_script_dhcp=boot.scr.uimg
boot_scripts=boot.scr.uimg boot.scr
So, if you want me to test anything else, at least to start I'd like you to be more descriptive until I learn this stuff a bit 😄
c
Alright, well, the function of U-Boot is to be a lightweight "shell" environment that just locates, loads, and runs the kernel. But the shell is also useful in case you mess up your kernel and it's panicking -- basically it either runs its own scripts to do a normal boot or you can take over and do whatever recovery you want through the UART pins.
It follows a "load-then-use" model, where the file loading commands just take a RAM address and it puts the contents of the file at that address. The boot commands accept the address and execute whatever's been loaded there.
This allows mixing and matching the various ways of retrieving files (load from microSD, over network, from NAND, ...) with the various ways of using files (execute it as a U-Boot script, as a kernel, treat it as a devicetree, ...)
I'm currently trying pretty hard to figure out how to get it to boot the TP2 kernel, so far I have:
Copy code
setenv bootargs earlycon=uart8250,mmio32,0x05000000 console=ttyS3,115200 loglevel=8 aw-ubi-spinand.ubootblks=24
load mmc 0 ${kernel_addr_r} zImage
load mmc 0 ${fdt_addr_r} sun8iw20p1-t113-turingmachines-tp2bmc.dtb
bootz ${kernel_addr_r} - ${fdt_addr_r}
...which assumes that the
zImage
and
bootz ${kernel_addr_r} - ${fdt_addr_r}
files from
buildroot/output/images/
are in the root of the microSD card.
When I try it, it goes:
Copy code
Kernel image @ 0x41000000 [ 0x000000 - 0x373358 ]
## Flattened Device Tree blob at 41800000
   Booting using the fdt blob at 0x41800000
Working FDT set to 41800000
ERROR: reserving fdt memory region failed (addr=41900000 size=700000 flags=0)
   Loading Device Tree to 42df2000, end 42dffbd5 ... OK
Working FDT set to 42df2000

Starting kernel ...
(And then goes silent)
Which makes me think it's trying to use the
sun8iw20p1-t113-turingmachines-tp2bmc.dtb
, balking at the
/memreserve/ 0x41900000 0x00100000;
directive, falling back to the builtin FDT, and the kernel goes quiet because it doesn't know how to set up the console UART.
d
I guess the image you provided does not contain neither zImage nor the dtb file
c
Right, to get those you need to get them out of the buildroot directory after compiling the firmware
d
I tried to
ls
but I'm missing the interface name I guess
c
ls mmc 0
d
Oh, ok, it just lists the filesystem content I have there
I can copy the files over
c
I suspect the
earlycon=
options are wrong
d
When I flashed the bin file, does this override any part of the FAT32 filesystem?
I guess it does?
c
Does your partition begin earlier than the 1MiB mark on your microSD?
d
I did not create a partition like now, so I think it just starts wherever it usually starts
c
I bought some microSDs from Micro Center a few weeks back and those had the partition beginning at 4MiB, but it might be that different manufacturers have different default partitioning
d
How do you check this? (I can also Google probably)
c
fdisk /dev/mmcblk0
->
p
d
Ok, I'll check that and put the files on the card
c
The units are usually in 512-byte sectors, so you should multiply the start number by 512, then divide by 1024 for KiBs and another 1024 for MiBs
(So yeah on my microSD card the start sector is 8192, which corresponds to 4MiB in)
d
1MB
Copy code
Device     Boot Start     End Sectors  Size Id Type
/dev/sdb1  *     2048 3932159 3930112  1,9G  b W95 FAT32
c
That should be fine then. The U-Boot file I sent is like 600-something KiBs, so I doubt flashing it would have overwritten any of the FAT32 partition
d
It's a VM with a SD card reader, thus
/dev/sdb
Yeah, I checked the file size and then asked 🙂
OK, let me put the files on the card and try to follow your steps and see what happens 🙂
c
I strongly expect you'll get the same thing. I'm looking into what the earlycon= setting should be so we can at least see Linux complain about why it can't boot
(But reproducibility is the cornerstone of science so by all means, try to duplicate my results!)
earlycon=uart8250,mmio32,0x05000000
->
earlycon=uart8250,mmio32,0x02500c00
might get us some output
d
Also my dtb filename is a bit different but this does not matter I guess
c
Yeah, the filename I gave is for the turingmachines/Firmware repo
d
(
sun8iw20p1-t113-100ask-t113-pro.dtb
)
c
^ that's the old wenyi dtb, which should still work
d
Ah, ok, I'm using the CE repo which I have in hand
c
Best to use what you know works 🙂
d
This one does for sure
Copy code
=> setenv bootargs earlycon=uart8250,mmio32,0x02500c00 console=ttyS3,115200 loglevel=8 aw-ubi-spinand.ubootblks=24
=> load mmc 0 ${kernel_addr_r} zImage
3962896 bytes read in 656 ms (5.8 MiB/s)
=> load mmc 0 ${fdt_addr_r} sun8iw20p1-t113-100ask-t113-pro.dtb
54900 bytes read in 13 ms (4 MiB/s)
=> bootz ${kernel_addr_r} - ${fdt_addr_r}
Kernel image @ 0x41000000 [ 0x000000 - 0x3c7810 ]
## Flattened Device Tree blob at 41800000
   Booting using the fdt blob at 0x41800000
Working FDT set to 41800000
ERROR: reserving fdt memory region failed (addr=41900000 size=700000 flags=0)
   Loading Device Tree to 42def000, end 42dff673 ... OK
Working FDT set to 42def000

Starting kernel ...
c
Yeah, I got the same results just now.
You're about caught up to where I am, anyway. There might be some debugging commands in U-Boot's documentation to show what it's doing but there isn't much I know how to do with a kernel that doesn't even come up heh
I do know U-Boot does a quick hackup of the FDT before it passes it to the kernel, in order to inject details like the cmdline/bootargs. Maybe we need to find a way to dump the post-hackup FDT.
Maybe it's also not automatically respecting
bootargs
unless we turn on some other setting first
My familiarity with U-Boot kinda runs out here, since I've never actually needed to use it beyond "debricking various embedded devices" before
Okay: it looks like the
ERROR: reserving fdt memory region failed
is not a fatal error, it just fails that one memreserve and moves on.
c
Their kernel at least has the decency to panic
A panic would be a huge improvement over the current situation 😂
d
Tried it with the MangoPi just to see if it's any different and rule-out something on TPi2, but I'm getting the exact same result
c
It feels like it might be that the correct FDT isn't loading somehow, like maybe it's being overwritten with U-Boot's one.
I don't know if
bootz [addr [initrd[:size]] [fdt]]
means that fdt is the one it should use or just where it should relocate it to before boot
If omitted it seems to try to put it at 0, which is definitely wrong
d
Should it look this way?
Copy code
=> load mmc 0 ${fdt_addr_r} sun8iw20p1-t113-100ask-t113-pro.dtb
54900 bytes read in 13 ms (4 MiB/s)
=> fdt addr ${fdt_addr_r}
Working FDT set to 41800000
=> fdt boardsetup
=> fdt list
/ {
        model = "sun8iw20";
        compatible = "allwinner,r528", "arm,sun8iw20p1";
        interrupt-parent = <0x00000001>;
        #address-cells = <0x00000002>;
        #size-cells = <0x00000002>;
        aliases {
        };
        chosen {
        };
        firmware {
        };
        cpus {
        };
        psci {
        };
        dump_reg@20000 {
        };
        cpu-opp-table {
        };
        dcxo24M_clk {
        };
        rc16m_clk {
        };
        ext32k_clk {
        };
        dram {
        };
        memory@40000000 {
        };
        share_space@0x42100000 {
        };
        interrupt-controller@3020000 {
        };
        timer_arch {
        };
        pmu {
        };
        power-management@ff000000 {
        };
        iommu@2010000 {
        };
        pio-18 {
        };
        pio-33 {
        };
        thermal-zones {
        };
        soc@3000000 {
        };
        vdd-cpu {
        };
        usb1-vbus {
        };
};
Ah, never mind, it's just a list.
fdt print
outputs the whole tree
c
It looks right. I'd also check /chosen since that's where the boot parameters go.
fdt chosen
might be needed to populate it. I don't know what all
fdt boardsetup
does. I'm really working at the fringe of my knowledge here haha
The fact that booting just a zImage WITHOUT specifying a fdt address tries to put it at 0 feels like an important clue.
d
One odd (or maybe not?) thing is if I reboot my board and straight run
bootz ${kernel_addr_r} - ${fdt_addr_r}
I get the exact same thing (output). But again, maybe that's ok
c
Hmm, actually it might be that it's still in RAM from before the reboot.
I've noticed this chip seems to have a fair amount of memory persistence, from when I was debugging the SPL. I didn't look more into it than that.
d
Yes, this is exactly the case
Copy code
=> fdt chosen
WARNING: could not set u-boot,version FDT_ERR_NOSPACE.
Could this cause this error?
Copy code
=> fdt rsvmem print
index              start                    size
------------------------------------------------
    0   0000000041900000        0000000000100000
    1   0000000041900000        0000000000700000
    2   0000000042000000        0000000000100000
    3   0000000042100000        0000000000010000
Look at the 0th and 1st reserve
Starting address
Copy code
Working FDT set to 41800000
ERROR: reserving fdt memory region failed (addr=41900000 size=700000 flags=0)
The 1st overlaps 0th and 2nd
When I moved them around as follows:
Copy code
=> load mmc 0 ${fdt_addr_r} sun8iw20p1-t113-100ask-t113-pro.dtb
54900 bytes read in 13 ms (4 MiB/s)
=> fdt addr ${fdt_addr_r}
Working FDT set to 41800000
=> fdt rsvmem print
index              start                    size
------------------------------------------------
    0   0000000041900000        0000000000100000
    1   0000000041900000        0000000000700000
    2   0000000042000000        0000000000100000
    3   0000000042100000        0000000000010000
=> fdt rsvmem delete 1
=> fdt rsvmem print
index              start                    size
------------------------------------------------
    0   0000000041900000        0000000000100000
    1   0000000042000000        0000000000100000
    2   0000000042100000        0000000000010000
=> fdt rsvmem delete 1
=> fdt rsvmem delete 1
=> fdt rsvmem add 0x42000000 0x700000
=> fdt rsvmem print
index              start                    size
------------------------------------------------
    0   0000000041900000        0000000000100000
    1   0000000042000000        0000000000700000
=> fdt rsvmem add 0x42700000 0x100000
=> fdt rsvmem add 0x4200000 0x10000
=> fdt rsvmem print
index              start                    size
------------------------------------------------
    0   0000000041900000        0000000000100000
    1   0000000042000000        0000000000700000
    2   0000000042700000        0000000000100000
    3   0000000004200000        0000000000010000
Then loaded zImage and tried to boot:
Copy code
=> bootz ${kernel_addr_r} - ${fdt_addr_r}
Kernel image @ 0x41000000 [ 0x000000 - 0x3c7810 ]
## Flattened Device Tree blob at 41800000
   Booting using the fdt blob at 0x41800000
Working FDT set to 41800000
   Loading Device Tree to 42def000, end 42dff673 ... OK
Working FDT set to 42def000

Starting kernel ...
This error no longer appears, but also nothing has really changed
Ah, I also got the last one wrong (index 3 - is set incorrectly)
But no change anyway (except for the error not appearing anymore):
Copy code
=> load mmc 0 ${fdt_addr_r} sun8iw20p1-t113-100ask-t113-pro.dtb
54900 bytes read in 13 ms (4 MiB/s)
=> fdt addr ${fdt_addr_r}
Working FDT set to 41800000
=> fdt rsvmem delete 1
=> fdt rsvmem delete 1
=> fdt rsvmem delete 1
=> fdt rsvmem add 0x42000000 0x700000
=> fdt rsvmem add 0x42700000 0x100000
=> fdt rsvmem add 0x42800000 0x10000
=> fdt rsvmem print
index              start                    size
------------------------------------------------
    0   0000000041900000        0000000000100000
    1   0000000042000000        0000000000700000
    2   0000000042700000        0000000000100000
    3   0000000042800000        0000000000010000
=> load mmc 0 ${kernel_addr_r} zImage
3962896 bytes read in 656 ms (5.8 MiB/s)
=> bootz ${kernel_addr_r} - ${fdt_addr_r}
Kernel image @ 0x41000000 [ 0x000000 - 0x3c7810 ]
## Flattened Device Tree blob at 41800000
   Booting using the fdt blob at 0x41800000
Working FDT set to 41800000
   Loading Device Tree to 42def000, end 42dff673 ... OK
Working FDT set to 42def000

Starting kernel ...
c
I think I used
fdt resize
to create enough room for chosen to do its thing
d
When I run
bdinfo
, I see:
Copy code
[...]
fdt_blob    = 0x47d3ad00
new_fdt     = 0x47d3ad00
fdt_size    = 0x000081c0
[...]
Do you happen to know what's this address?
Again, maybe it's ok, I'm just looking around not knowing what is what 🙂
c
I'm guessing those are the "control" FDT:
fdtcontroladdr=47d197e0
Which is the FDT used by U-Boot itself
(The dream is to be able to boot the kernel off of the same FDT that U-Boot uses!)
d
I'm not even sure why is this a problem 😛
c
As in, why we have to use 2 different FDTs for now?
d
I guess so
c
Allwinner's fork of Linux (what the firmware is using right now) uses different
compatible =
strings and enums for certain things from the upstream kernel (and U-Boot). 😔
d
Ok, this makes sense now
Also, thank you for all the explanations you provided in the past few hours. You made me somewhat familiar with U-Boot and other things 🙂
c
Myself as well. I've really only used it at a basic level (debricking and troubleshooting bad firmware upgrades on my other devices) and explaining it is helping me grasp the concepts more fully.
d
Are you going to pursue this further? What would be the next steps to try to figure out why the kernel is not booting?
c
I'm definitely not stopping until I get a kernel to boot.
d
I myself checked a lot of things I could find and think of, all with no results
c
(And I also need to add NAND support to the SPL, and a USB device driver so we can use this for flashing the firmware without needing to use PhoenixSuit)
d
Yeah. Is it going to like expose the NAND in a similar fashion CM4s do?
c
I haven't decided entirely how exactly I'd go about that yet 🤔
My current mental roadmap is we'd have a cross-platform tool that uses FEL to load U-Boot on there, then when U-Boot comes up it runs a small script that makes itself a USB device to do... something. Exposing the NAND back to the tool might be a good option.
Another option would be to boot the kernel with an initramfs + USB gadget driver, then control that booted kernel to do the install
might show some promise
OKAY I fixed the config addresses for the "debug uart" to be UART3 (in the kernel config):
Copy code
CONFIG_DEBUG_UART_PHYS=0x02500c00
CONFIG_DEBUG_UART_VIRT=0xf2500c00
...and added
earlyprintk
to the
bootargs
, and got some output.
This is enough that I can probably start to figure out why it's not behaving right
d
I was curious about the
earlyprintk
since I've seen it here and there
What's the virtual one?
c
I think that's for when the kernel has the MMU up: early boot stuff (early kernel, and all of U-Boot) runs without the MMU enabled, so any CPU instructions that access RAM go straight to physical addresses (think "real mode" from the DOS days), but then the kernel sets up the MMU so it can do virtual address spaces (i.e. give every process its own private memory space) and it ends up having to map the hardware somewhere where it can still access it through the MMU.
So, phys for early boot (pre-MMU) and virt for later boot (post-MMU)
d
Makes sense. Thank you for clear and informative explanations. I'm enjoying them a lot!
So, are you getting something through
earlyprintk
?
c
I am not being a very good scientist here in that the second test has the full cmdline + the official FDT, while the first test is just
earlyprintk
on U-Boot's FDT. I'm not changing only one variable at a time.
But the second test is what's been failing behind the scenes for us since we first tried
d
Bad state in the Power State Coordination Interface
Tells me nothing 😛
But yeah, it at least prints the issue
c
I also added
full-cmdline-but-uboot-fdt.txt
to the gist, which uses the control (U-Boot's) FDT to boot with the same cmdline as
full-cmdline.txt
So it panics early if we give it the official firmware FDT but gets a good deal further (to the point of trying to mount the rootfs) if we give it the U-Boot FDT.
d
Yeah, I can see that
c
It also goes dead silent if I take out
earlyprintk
so it's not switching to the correct console either way
d
So you are just changing the FDT address without loading the official firmware dtb file as in previous commands?
c
I'm doing:
setenv bootargs ...; load mmc 0 ${loadaddr} zImage; bootz ${loadaddr} - ${fdtcontroladdr}
d
Yeah, this is what I meant
Sorry if I'm asking too many questions, but previously we've been using
kernel_addr_r
for zImage and it's different from
loadaddr
or at least is different in what I have. Is there any reason for this?
I'm just super curious 😄
c
The only reason I changed is because
bootz
on its own defaults to using
loadaddr
, so I'm like "Eh, might as well load stuff at
loadaddr
to keep things simple."
d
Ah, ok:) Thank you
c
kernel_addr_r
is also an appropriate choice though
You could even make up your own address as long as it's a valid location in RAM and doesn't overlap anything important
d
Yeah, I've been looking a bit at the memory map already (if this is how it's called)
c
Okay so the official .dtsi (and therefore official FDT) contains this:
Copy code
psci {
                compatible = "arm,psci-1.0";
                method = "smc";
        };
...it's saying "you can use
smc
calls to access the PSCI subsystem" which is evidently untrue because it causes a kernel panic when Linux tries
Could you try loading up the official FDT again and using the
fdt
commands to delete the
/psci
node?
d
Sure
Give me a sec
c
"smc" is apparently a call into the secure processor, but since we aren't booting OP-TEE (like the official firmware does) the secure processor isn't running, so the kernel just panics when trying to talk to it?
d
It booted!
c
Oh nice okay
My guess is it got far enough to panic when it couldn't find rootfs 😂
d
Kernel-panics at the fs mount
Yeah 🙂
Want a full dump?
c
Oh I'd rather go for the finish line
Let me check my notes for the missing bootargs to get the system fully up...
d
This?
Copy code
[    9.864188] ---[ end Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(0,0) ]---
c
Oh no I mean, "I think we're close enough to getting it to boot fully that we should just shoot for that"
d
Ah, sorry 😄
c
Add these:
ubi.mtd=sys root=ubi0_5 rootfstype=ubifs,rw init=/sbin/init partitions=mbr@ubi0_0:boot-resource@ubi0_1:env@ubi0_2:env-redund@ubi0_3:boot@ubi0_4:rootfs@ubi0_5:recovery@ubi0_6:dsp0@ubi0_7:private@ubi0_8:UDISK@ubi0_9:
...to the existing bootargs
d
Ok
c
If it boots we can call that a win and I'll probably go to bed (need to sleep on what's the best thing to do about that
/psci
node)
d
I was actually heading to my bed when you wrote about
psci
c
Well I don't want to keep you up any later either, so here's hoping we can secure a full boot 😄
d
Nah, If I did not wan, I would not get back 😄
It went further but bad things are happening
Prints some memory or something
Like this:
Copy code
[   46.198019] 0001e5e0: d9 06 6a bd b1 3b 39 50 bb 3d ad 35 ee c6 3c ac 74 04 21 a3 c5 6b 4b 88 b7 da 68 ee 4e c2 5e ba  ..j..;9P.=.5..<.t.!..kK...h.N.^.
[   46.198041] 0001e600: e0 94 b4 19 ee 1f b7 5c 7f 80 5c 3e 6d 5e d2 51 a4 a5 a7 44 89 d6 53 9f c1 17 04 f6 d5 f0 31 46  .......\..\>m^.Q...D..S.......1F
[   46.198063] 0001e620: dc 45 0f c9 c3 d0 09 0e 00 96 2f fd be 8a cf fd 27 b3 ec 9d a7 e7 a7 f1 cd 40 e2 bf b1 97 ac 2d  .E......../.....'........@.....-
[   46.198085] 0001e640: 98 9c 86 fd 53 7e fe e4 b4 1b 3c 39 f1 e6 93 68 2f e9 07 3c 54 7a de 9f ee b0 bb 85 4d c9 72 7a  ....S~....<9...h/..<Tz......M.rz
[   46.198107] 0001e660: 9e a7 fb f8 fb e1 8c eb ab 2d 72 51 3d f0 ff 04 f5 95 b3 df 64 c4 1a 2f 1f 03 fd 58 cd 2e c6 68  .........-rQ=.......d../...X...h
[   46.198129] 0001e680: c6 99 d1 f4 9a be b8 66 61 93 fd 0a d3 98 ff 9a f9 aa 70 c4 5f 65 fa 40 32 01 f8 b4 ee 7c 95 ff  .......fa.........p._e.@2....|..
[   46.198151] 0001e6a0: 1b d0 af 50 dd 76 30 ce af 7c 06 ff 40 e5 97 98 26 de 87 fa ec 55 f8 e3 a7 bf c5 b1 79 26 10 df  ...P.v0..|..@...&....U......y&..
[   46.198173] 0001e6c0: d5 e6 d1 74 b9 67 c2 be 70 06 3d 87 ee 74 0c f1 4a 13 4d ab cb 21 63 70 16 3b 0c 97 13 eb 9d e3  ...t.g..p.=..t..J.M..!cp.;......
[   46.198195] 0001e6e0: 48 eb 32 96 c3 17 4f f5 64 10 bb 3e 5b e1 4c 39 0b c6 b3 bd d6 59 e8 3f fa 8d 98 dc f6 3e 0d fe  H.2...O.d..>[.L9.....Y.?.....>..
[   46.198217] 0001e700: fa 02 b5 4f 5f aa f6 e9 68 71 cc 93 02 9d 58 b3 47 14 51 bd d6 ad 8d 27 61 bb ec a2 6f a7 01 4b  ...O_...hq....X.G.Q....'a...o..K
[   46.198239] 0001e720: 88 ee 88 d7 f4 36 ce cb 0b 25 ad d3 3c 2d 87 c7 c5 97 b0 bf c2 35 d0 a1 99 f9 69 e9 83 34 99 f5  .....6...%..<-.......5....i..4..
[   46.198261] 0001e740: 22 68 27 ea 7e d9 71 e9 43 75 35 ed cb f9 8d f1 42 1f 6c 1e e2 12 b3 1d 7a 89 5a 89 83 9c cc a7  "h'.~.q.Cu5.....B.l.....z.Z.....
[   46.198283] 0001e760: 35 7c 9e e2 11 fa 29 1e c1 ab 78 84 63 45 31 0f c9 0c f9 7f 80 cd 2a ed 09 c0 78 3b 4c 77 e0 1f  5|....)...x.cE1.......*...x;Lw..
c
That looks like the dump that UBI gives when it can't read the flash properly
d
It printed something between too, but I'd have to extend the console buffer to be able to dump and read
It's still going 🙂
For my notes:
Copy code
setenv bootargs earlycon=uart8250,mmio32,0x02500c00 console=ttyS3,115200 loglevel=8 aw-ubi-spinand.ubootblks=24 ubi.mtd=sys root=ubi0_5 rootfstype=ubifs,rw init=/sbin/init partitions=mbr@ubi0_0:boot-resource@ubi0_1:env@ubi0_2:env-redund@ubi0_3:boot@ubi0_4:rootfs@ubi0_5:recovery@ubi0_6:dsp0@ubi0_7:private@ubi0_8:UDISK@ubi0_9:
load mmc 0 ${kernel_addr_r} zImage
load mmc 0 ${fdt_addr_r} sun8iw20p1-t113-100ask-t113-pro.dtb
fdt addr ${fdt_addr_r}
fdt rm /psci
bootz ${kernel_addr_r} - ${fdt_addr_r}
I wonder if this is a result of what I flashed to my board previously
c
All looks good to me
d
Maybe I took the wrong one
c
Oh! Yeah maybe the
zImage
still has SIMULATE_MULTIPLANE off?
d
Like I never reverted the foirmware on one of them
Switching boards
Yes, this is the "bad" one 🙂
Just checked to make sure
c
Ah it works on the other board?
d
Switching
Realized I check before switch and connected it again
Sec
Same thing, though
A photo was currently the only way I could catch this:
c
Was your
zImage
built with SIMULATE_MULTIPLANE still turned off?
d
Oh, wait
Maybe
c
Check
buildroot/output/build/linux-*/.config
d
Quite possibly
I think I might have the official v1.0.0 build somewhere
Sec
I do
Updating the files
c
I have a seething hatred for SIMULATE_MULTIPLANE 😔
d
It is booting, but something's different
Is taking a minute already
c
Interesting, so it's booting more slowly or...?
d
It has some delays at some points
I'll put relevant part of the output here in a sec
Copy code
[   11.220050] ubi0: scanning is finished
[   11.395270] ubi0: attached mtd3 (name "sys", size 123 MiB)
[   11.402430] ubi0: PEB size: 262144 bytes (256 KiB), LEB size: 258048 bytes
[   11.410949] ubi0: min./max. I/O unit sizes: 4096/4096, sub-page size 2048
[   11.420016] ubi0: VID header offset: 2048 (aligned 2048), data offset: 4096
[   11.428106] ubi0: good PEBs: 492, bad PEBs: 0, corrupted PEBs: 0
[   11.434937] ubi0: user volume: 10, internal volumes: 1, max. volumes count: 128
[   11.443546] ubi0: max/mean erase counter: 3/1, WL threshold: 4096, image sequence number: 0
[   11.453072] ubi0: available PEBs: 0, total reserved PEBs: 492, PEBs reserved for bad PEB handling: 20
[   11.463774] ubi0: background thread "ubi_bgt0d" started, PID 950
[   11.476147] sun8iw20-pinctrl 2000000.pinctrl: pin PB6 already requested by 2500c00.uart; cannot claim for 2000000.pinctrl:38
[   11.490220] sun8iw20-pinctrl 2000000.pinctrl: pin-38 (2000000.pinctrl:38) status -22
[   11.499281] ERR: id gpio_request failed
[   11.516588] alloc_fd: slot 0 not NULL!
[   11.536861] UBIFS (ubi0:5): Mounting in unauthenticated mode
[   11.941844] UBIFS (ubi0:5): recovery needed
[   12.376142] UBIFS (ubi0:5): recovery deferred
[   12.383227] UBIFS (ubi0:5): UBIFS: mounted UBI device 0, volume 5, name "rootfs", R/O mode
[   12.392806] UBIFS (ubi0:5): LEB size: 258048 bytes (252 KiB), min./max. I/O unit sizes: 4096 bytes/4096 bytes
[   12.404135] UBIFS (ubi0:5): FS size: 31223808 bytes (29 MiB, 121 LEBs), journal size 8515584 bytes (8 MiB, 33 LEBs)
[   12.416158] UBIFS (ubi0:5): reserved for root: 0 bytes (0 KiB)
[   12.422861] UBIFS (ubi0:5): media format: w4/r0 (latest is w5/r0), UUID 51DFA355-BA64-4A15-A9B5-8B4ED2B09994, small LPT model
[   12.444512] VFS: Mounted root (ubifs filesystem) readonly on device 0:15.
[   12.464422] devtmpfs: mounted
[   12.475645] Freeing unused kernel memory: 1024K
[   12.481457] Run /sbin/init as init process
[   12.747525] hdmi_hpd_sys_config_release
INIT: version  booting
INIT: No inittab.d directory found
[   14.031462] UBIFS (ubi0:5): completing deferred recovery
[   14.390660] UBIFS (ubi0:5): background thread "ubifs_bgt0_5" started, PID 963
[   14.417969] UBIFS (ubi0:5): deferred recovery completed
INIT: Entering runlevel: 3
Starting syslogd: OK
Starting klogd: OK
Running sysctl: OK
Populating /dev using udev: [   17.668102] udevd[1010]: starting version 3.2.11
[   18.200262] udevd[1011]: starting eudev-3.2.11
done
Starting adb [   30.490789] file system registered
install_listener('tcp:5037','*smartsocket*')
[   31.073029] read descriptors
[   31.080389] read strings
[   57.077454]
[   57.077454] insmod_device_driver
[   57.077454]
[   57.085547] sunxi_usb_udc 4100000.udc-controller: 4100000.udc-controller supply udc not found, using dummy regulator
device_chose finished!
Initializing random number generator: FAIL
Starting rpcbind: OK
Starting network: [   59.212171] libphy: 4500000.eth: probed
[   59.248257] sunxi-gmac 4500000.eth eth0: eth0: Type(7) PHY ID 001cc816 at 0 IRQ poll (4500000.eth-0:00)
udhcpc: started, v1.35.0
udhcpc: broadcasting discover
[   62.517698] sunxi-gmac 4500000.eth eth0: Link is Up - 100Mbps/Full - flow control off
[   62.526619] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
udhcpc: broadcasting discover
udhcpc: broadcasting discover
udhcpc: no lease, forking to background
OK
Starting sshd: OK
Starting collectd: OK
 _____ _   _ ____  ___ _   _  ____
|_   _| | | |  _ \|_ _| \ | |/ ___|
  | | | | | | |_) || ||  \| | |  _
  | | | |_| |  _ < | || |\  | |_| |
  |_|  \___/|_| \_\___|_| \_|\____|
[   72.098767] FAT-fs (mmcblk0p1): Volume was not properly unmounted. Some data may be corrupt. Please run fsck.
Hello bmct start build time = 12:46:40-Nov 17 2022------>

Welcome to Turing pi
turing login: bmc init start
Cannot open /sys/class/gpio/gpio115/direction.
ping: sendto: Network is unreachable
And it currently sits there
Ah, wait
This is the missing prompt
Copy code
ping: sendto: Network is unreachable

Welcome to Turing pi
turing login: root
Password:
#
c
We'll definitely have to troubleshoot the slower boot; I might start by trying to delete the
/firmware/optee
node too, but...
I think we can call this a win for now
d
I guess so 😄
Let me try this last thing 🙂
c
Hey @svenrademakers ^ Daniel just became the first person to boot a TP2 without any of Allwinner's proprietary stuff (mostly-vanilla U-Boot on a microSD, but it proves that we can de-proprietarize the boot process)
I'mma go to bed, I shouldn't be up any later haha
d
It's 8:24 am here 😄
Booting
If you have a minute stil..
c
I think I'll switch to my phone and read in bed for a bit, but it might be a good exercise to turn your boot commands into a
boot.scr
(note: this is not just a text file, it's made with
mkimage
, no idea how, could be a good exercise if you want something more to do)
d
So, it's the same. Or at least I do not see any difference (there might be some in the log)
For my notes:
Copy code
setenv bootargs earlycon=uart8250,mmio32,0x02500c00 console=ttyS3,115200 loglevel=8 aw-ubi-spinand.ubootblks=24 ubi.mtd=sys root=ubi0_5 rootfstype=ubifs,rw init=/sbin/init partitions=mbr@ubi0_0:boot-resource@ubi0_1:env@ubi0_2:env-redund@ubi0_3:boot@ubi0_4:rootfs@ubi0_5:recovery@ubi0_6:dsp0@ubi0_7:private@ubi0_8:UDISK@ubi0_9:
load mmc 0 ${kernel_addr_r} zImage
load mmc 0 ${fdt_addr_r} sun8iw20p1-t113-100ask-t113-pro.dtb
fdt addr ${fdt_addr_r}
fdt rm /psci
fdt rm /firmware/optee
bootz ${kernel_addr_r} - ${fdt_addr_r}
I will try that 🙂
Done 🙂
It now boots straight to the prompt 🙂
Copy code
daniel@tpi2-firmware:~$ mkimage -c none -A arm -T script -d boot.txt boot.scr
Image Name:   
Created:      Wed May 24 08:36:06 2023
Image Type:   ARM Linux Script (gzip compressed)
Data Size:    530 Bytes = 0.52 KiB = 0.00 MiB
Load Address: 00000000
Entry Point:  00000000
Contents:
   Image 0: 522 Bytes = 0.51 KiB = 0.00 MiB
t
is happy about the progress... and anxiously waits some more... ;-)
s
I enjoyed reading this back 🙂 @DhanOS (Daniel Kukiela) @CFSworks awesome work you guys are doing. freeing ourselves form the proprietary software is going to improve our quality of life a lot. Once you got your sleep, could you elaborate which uboot version you used, was it the commit you mentioned before ?
c
It's U-Boot's latest
master
with two patchsets from Andre Przywara and Maksim Kiselev merged in, for T113 and T113-SPI support respectively. (I'm working with them to get those respective patchsets merged upstream, but for the time being we might have to have a directory of .patch files)
d
So, I grabbed the logs from the original and U-boot boot process:
I then used a diff tool and post-processed logs a bit so it's easier to read and see things:
This is an HTML file showing the differences
I think the slower boot process is caused by the fact that it uses only a single CPU core with the new U-boot
Also, the original firmware does this:
Copy code
[    0.000000] psci: probing for conduit method from DT.
[    0.000000] psci: PSCIv1.0 detected in firmware.
[    0.000000] psci: Using standard PSCI v0.2 function IDs
[    0.000000] psci: MIGRATE_INFO_TYPE not supported.
[    0.000000] psci: SMC Calling Convention v1.0
The new U-boot also causes this:
Copy code
[    4.463634] ------------[ cut here ]------------
[    4.463711] WARNING: CPU: 0 PID: 26 at drivers/clk/sunxi-ng/ccu_common.c:34 ccu_nm_set_rate+0x260/0x298
[    4.463734] Modules linked in:
[    4.463779] CPU: 0 PID: 26 Comm: kworker/0:1 Not tainted 5.4.61 #2
[    4.463802] Hardware name: Generic DT based system
[    4.463849] Workqueue: events start_work
[    4.463920] [<c010e048>] (unwind_backtrace) from [<c010a810>] (show_stack+0x10/0x14)
[    4.463972] [<c010a810>] (show_stack) from [<c05e803c>] (dump_stack+0x7c/0x98)
[    4.464024] [<c05e803c>] (dump_stack) from [<c011917c>] (__warn+0xb8/0xd0)
[    4.464075] [<c011917c>] (__warn) from [<c0119204>] (warn_slowpath_fmt+0x70/0x9c)
[    4.464129] [<c0119204>] (warn_slowpath_fmt) from [<c037b314>] (ccu_nm_set_rate+0x260/0x298)
[    4.464187] [<c037b314>] (ccu_nm_set_rate) from [<c03737e0>] (clk_change_rate+0x104/0x238)
[    4.464244] [<c03737e0>] (clk_change_rate) from [<c0374154>] (clk_core_set_rate_nolock+0x124/0x138)
[    4.464299] [<c0374154>] (clk_core_set_rate_nolock) from [<c03741a0>] (clk_set_rate+0x38/0x5c)
[    4.464354] [<c03741a0>] (clk_set_rate) from [<c0334af8>] (lcd_clk_enable+0x144/0x454)
[    4.464407] [<c0334af8>] (lcd_clk_enable) from [<c033575c>] (disp_lcd_enable+0x13c/0x2d4)
[    4.464466] [<c033575c>] (disp_lcd_enable) from [<c032fbf8>] (disp_device_attached_and_enable+0x134/0x264)
[    4.464526] [<c032fbf8>] (disp_device_attached_and_enable) from [<c032fe68>] (bsp_disp_device_switch+0x84/0xe8)
[    4.464582] [<c032fe68>] (bsp_disp_device_switch) from [<c032a7b4>] (disp_device_set_config+0xc8/0xf0)
[    4.464636] [<c032a7b4>] (disp_device_set_config) from [<c032a8e0>] (start_work+0x104/0x16c)
[    4.464693] [<c032a8e0>] (start_work) from [<c012f268>] (process_one_work+0x16c/0x20c)
[    4.464750] [<c012f268>] (process_one_work) from [<c012f874>] (worker_thread+0x230/0x2d4)
[    4.464805] [<c012f874>] (worker_thread) from [<c013403c>] (kthread+0x118/0x124)
[    4.464855] [<c013403c>] (kthread) from [<c01010e8>] (ret_from_fork+0x14/0x2c)
[    4.464885] Exception stack(0xc6b29fb0 to 0xc6b29ff8)
[    4.464921] 9fa0:                                     00000000 00000000 00000000 00000000
[    4.464964] 9fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[    4.465002] 9fe0: 00000000 00000000 00000000 00000000 00000013 00000000
[    4.465047] ---[ end trace 70ba11ab2ba2b0ff ]---
There are also a few other things
c
I was suspecting the same thing actually, it's good that what you've found supports this and brings you to the same conclusion 😄
PSCI is an interface where the firmware (in our case, U-Boot) leaves a small handler registered that the OS can call into in order to enable/disable other CPU cores (and possibly other things like managing clock speeds)
Since our new U-Boot isn't providing PSCI, we had to remove the
/psci
node which told Linux it could use it, but that also means it has no way of turning on the second CPU core.
I don't think the old U-Boot, strictly speaking, provides it either but rather it's a function of OP-TEE (which Allwinner loads right before U-Boot). There's code in U-Boot to provide PSCI, but this patch series doesn't seem to know how to manage the CPU cores of the T113. I might have to be the one to provide this patch 😔
d
Well... if I only knew as much about that as you do 🙂
c
I've been staring at the T113-S3 user manual for the past however trying to figure out how to support it.
This is the file that has to be updated for T113's stuff:
None of the
cpucfg
stuff is the same (so it all has to be updated for T113) but I was able to find the GIC and I also have
sunxi_set_entry_address
done
d
I can, at least, read this code 😛 (do not confuse with understand it 😛 )
I mean, I kind of understand what is it, but lack of knowledge and experience in this area makes me useless to provide any help here
c
It's a lot closer to the metal than I usually work too.
At any rate being able to provide PSCI would be nice since we don't have to then delete that
/psci
node anymore, and we can probably use the FDT unmodified.
Fixing PSCI is still on my radar - I've just been really busy with "real" work since yesterday morning, but it does look like it won't take me long once I have time to do it.
d
Wait, what? Not done yet? 😛
I feel you, though, I'm busy at work too. I was thinking to maybe looking at this and see what I can come up with, but with zero experience that'd probably take to long 😄 I think I'm going to enjoy reading your patch file, though 🙂
Also, I guess no one expected you to have a solution in a day or two 🙂
j
the two of you have made incredible progress in short time already! even though this is way out of my depth i have very much enjoyed following along.
d
This is mostly CFSworks, though 🙂 I'm just hanging around, learning and helping to test (and even a little to debug) things.
b
Hey guys, looking to flash my BMC since I lost access a while ago trying to set a fixed mac address
Is there any recent (beta) release I can try / test? 🙂
d
There's nothing you can flash to use it daily, not yet at least
c
Here's my WIP patch, though it doesn't work yet (Linux stalls at
[    0.123137] smp: Bringing up secondary CPUs ...
):
I suspect I'm not actually powering up the core, because my
sunxi_cpu_set_power
is empty, and I don't know anything about the PRCM registers so I don't know what I need to be hitting to get them powered up
d
If I read this correctly, the second core is powered, but held in a reset state?
c
Hm... that makes sense with what that Linux patch is doing.
d
Also:
So the point is (if I'm not wrong) to find a register which de-asserts reset signal for CPU1
c
My code for this:
Copy code
c
static void __secure sunxi_cpu_set_reset(int cpu, bool active)
{
        if (active)
                clrbits_le32(SUNXI_CPUX_BASE + C0_RST_CTRL, BIT(cpu));
        else
                setbits_le32(SUNXI_CPUX_BASE + C0_RST_CTRL, BIT(cpu));
}
Since 0 is assert and 1 is deassert, clrbits on active seems correct. (I'm just making sure I didn't make a subtle mistake somewhere...)
d
Digging into the manual
I'll let you know if I find anything
c
👍
I probably got a flag backwards somewhere and double-checking my work ought to find it
d
Which bit value are you using for CPU1?
The bit should be 1 and de-assert is 1 indeed
c
cpu==1 would mean BIT(1) which expands to (1<<1) so 2
The other thing to look at would be my
sunxi_set_entry_address
d
Does
active
mean desired state?
c
active
means whether reset is asserted, so we're doing
sunxi_cpu_set_reset(1, false);
(To leave the reset state)
d
Ok, so this is a reset state, makes sense
c
I'm gonna check this code in a disassembler just to make sure it's landing on the absolute right registers
Dunno if you speak ARM assembly:
Copy code
arm
            0x42e81510      001000e3       movw r1, 0                                               
            0x42e81514      011940e3       movt r1, 0x901                                           
            0x42e81518      002091e5       ldr r2, [r1]                ; psci.c:0                   
            0x42e8151c      1320c2e1       bic r2, r2, r3, lsl r0      ; psci.c:191  if (active)    
            0x42e81520      002081e5       str r2, [r1]                ; psci.c:0
d
Did you put patched code somewhere?
c
My diff is in the footer of my email
d
Already the above answered my question
I've been checking this:
Copy code
0x42e81510      001000e3       movw r1, 0     
            0x42e81514      011940e3       movt r1, 0x901
c
Or wait, that's the "assert reset" that's the first step of the power-on
Entry point address is definitely being written to
0x70005c8
d
If you find a while, can you attach this patch file here? I tried to copy it from the footer of your email, but due to some formatting done to it, I cannot apply it. Would be easier than trying to fix it by hand. I also don't need it now (I'm working right now)
c
I think what's missing is the config for
ARMV7_SECURE_BASE
(see
arch/arm/cpu/armv7/Kconfig
) to relocate the PSCI code into the secure RAM area
The official firmware uses OP-TEE to run the secure area, so reverse-engineering that would tell us where the secure RAM is located (I do not see it in the user manual)
I'd look more into it but, I have to be a responsible adult and take care of my actual work (I should be a lot freer after this weekend)
d
Would it be:
?
Don't think that's the thing you are searching for, though?
c
I think that's control registers for managing the secure memory.
The memory itself needs to be at least 8KiB since that's how big the secure segment is in u-boot
Copy code
[    0.106693] Setting up static identity map for 0x40100000 - 0x40100060
[    0.114478] rcu: Hierarchical SRCU implementation.
[    0.122954] smp: Bringing up secondary CPUs ...
1B!abcdef;-)
Does not look like much but I added a bunch of debug prints to the asm code that initializes the second CPU and it's definitely awake.
(
1
-> request to turn on CPU1,
B!
-> CPU1 beginning its init routine,
abcdef
are the calls within the routine, and
;-)
is
psci_arch_init
)
d
So what does that mean? I thought you need to move the psci code to the secure memory to make it work
Like will I be able to boot the BMC using both cores now?
I tried to search for the secure memory address but without luck
c
It seems like with this SoC you can just mark an arbitrary region of DRAM as secure (U-Boot already does this). It just means the core is coming up, Linux just isn't catching it somehow
d
About setting DRAM as secure - this is what I kind of figured by looking around, but wasn't sure if this is a case, but you confirmed this
So, the version of U-Boot I have already initializes both cores?
I'm not sure when you applied this patch you mentioned earlier)
c
The patch I posted today was something I wrote today
I thought it wasn't bringing up the second core but really it IS up, just not making it far enough for Linux to see it
So my patch is probably okay and there's a bug elsewhere
d
You kind of making me want to setup things so I can play with it on my own and have some fun (or maybe actually even do something useful)
😄
c
Linux is asking for core #2 using PSCI and waiting indefinitely for it to show up here, basically
And my patch (the one from today) is bringing the core up, and it does run code, it just doesn't ever get to the state Linux wants it in
d
Do you have any plans of putting this into some sort of repo with brief explanation how you compile and run it? It'd be a nice thing (at least for me) to learn something new, play with it and maybe be more helpful 🙂
c
It's pretty much just with a handful of quality of life patches. The only things I've added in are a patchset to support SPI, a fix to the clock driver, a devicetree for the TP2, and better Clang support (which isn't needed when using a GNU toolchain)
I shared the config and devicetree here
d
I guess I'll walk through the history and see if I can put it together
c
If we go much longer without this stuff being merged to U-Boot master I can put what I have up on GitHub but I'm hoping things will be merged before then
d
I'm not sure if I've seen all the things you just mentioned here
c
Most of what I've mentioned isn't needed for simple tinkering. You can probably just grab the apritzel branch, my devicetree and defconfig, and roll with that
I got PSCI working, the problem is the T113's CBAR (a read-only register which is supposed to contain the base address of the GIC) contains the value
0x01C80000
, which is correct in most of the Allwinner sunxi series... but in newer chips they moved it to
0x03020000
U-Boot trusts that the CBAR is correct to reconfigure the GIC so the OS can manage the interrupts from non-secure mode, but because it wasn't correct it just ends up throwing away those writes and the GIC doesn't trust Linux once it boots non-secure. So interrupts flat-out don't work.
The funny thing is it had nothing to do with PSCI at all. The second core was coming up just fine. But the first core wasn't going to be able to boot with or without SMP because it wasn't being given the permissions it needed to manage the GIC.
So, "kind of a hardware bug but we can work around it in software"
d
YAY, Nice!
c
The bad news is it still boots slowly, which I'm thinking is perhaps a lack of CPU frequency scaling?
But I'm in the process of walking back all of my PSCI debug edits to make sure none of them are also necessary
d
It returns both on vanilla and u-boot: ``` [ 0.019205] /cpus/cpu@0 missing clock-frequency property [ 0.006018] /cpus/cpu@1 missing clock-frequency property ```if this is what you mean
Does it now say it brought up 2 cpus?
c
I do see 2 CPUs with this fixed
d
If you manage to have something to share, I could run it on my board and compare the logs to see if I can see anything, but I'm guessing I won't find anything
c
More up-to-date build, with SMP support
I suspect you probably still need to be deleting the
/psci
node (U-Boot synthesizes its own anyway)
d
I'll check this build shortly and if I still need to remove the
/psci
. Thank you
c
A good thing to compare would be the directory entries under
/sys/devices/system/cpu/cpu0/
with both Allwinner's boot and this new U-Boot.
d
Sure
c
Probably also
/sys/devices/system/cpu/
to see if there's a
cpufreq
directory that disappears when booting from U-Boot.
Since my guess as to why it's slow is Linux doesn't have access to the throttle to raise the CPU speed from boot clocks when there's load
d
I'll finish what I'm doing and I'll test it
cpufreq
does not exist while booted up un the official way
Booted up without removing the
/psci
node
c
w/ both CPUs online?
d
Checking and analyzing the log now
Correct:
Copy code
smp: Brought up 1 node, 2 CPUs
SMP: Total of 2 processors activated (96.00 BogoMIPS).
The boot logs look so much more similar now, but I still have to do some things to be able to compare them fully since the lines do not appear in the same order
Moved for a second to this and both directories (with their content) look identical
I used:
Copy code
find /sys/devices/system/cpu/ -type f -print -exec cat "{}" \;
```to print the output but this does not follow symlinks so I also used:
find /sys/devices/system/cpu/ -follow -maxdepth 8 -print -exec cat "{}" \; ```but this contains recurrent symlinks, thus I had to use max depth to limit them a bit. in both cases they print exactly the same thing
Back to analyzing the boot logs
- This is curious: Original:
Copy code
[    0.000000] psci: PSCIv1.0 detected in firmware.
U-boot:
Copy code
[    0.000000] psci: PSCIv65535.65535 detected in firmware.
😄
I also see this (with U-boot):
Copy code
[    3.036322] ------------[ cut here ]------------
[    3.036401] WARNING: CPU: 1 PID: 33 at drivers/clk/sunxi-ng/ccu_common.c:34 ccu_nm_set_rate+0x260/0x298
[    3.036425] Modules linked in:
[    3.036469] CPU: 1 PID: 33 Comm: kworker/1:1 Not tainted 5.4.61 #2
[    3.036492] Hardware name: Generic DT based system
[    3.036537] Workqueue: events start_work
[    3.036608] [<c010e048>] (unwind_backtrace) from [<c010a810>] (show_stack+0x10/0x14)
[    3.036660] [<c010a810>] (show_stack) from [<c05e803c>] (dump_stack+0x7c/0x98)
[    3.036712] [<c05e803c>] (dump_stack) from [<c011917c>] (__warn+0xb8/0xd0)
[    3.036763] [<c011917c>] (__warn) from [<c0119204>] (warn_slowpath_fmt+0x70/0x9c)
[    3.036818] [<c0119204>] (warn_slowpath_fmt) from [<c037b314>] (ccu_nm_set_rate+0x260/0x298)
[    3.036875] [<c037b314>] (ccu_nm_set_rate) from [<c03737e0>] (clk_change_rate+0x104/0x238)
[    3.036932] [<c03737e0>] (clk_change_rate) from [<c0374154>] (clk_core_set_rate_nolock+0x124/0x138)
[    3.036987] [<c0374154>] (clk_core_set_rate_nolock) from [<c03741a0>] (clk_set_rate+0x38/0x5c)
[    3.037042] [<c03741a0>] (clk_set_rate) from [<c0334af8>] (lcd_clk_enable+0x144/0x454)
[    3.037096] [<c0334af8>] (lcd_clk_enable) from [<c033575c>] (disp_lcd_enable+0x13c/0x2d4)
[    3.037155] [<c033575c>] (disp_lcd_enable) from [<c032fbf8>] (disp_device_attached_and_enable+0x134/0x264)
[    3.037215] [<c032fbf8>] (disp_device_attached_and_enable) from [<c032fe68>] (bsp_disp_device_switch+0x84/0xe8)
[    3.037271] [<c032fe68>] (bsp_disp_device_switch) from [<c032a7b4>] (disp_device_set_config+0xc8/0xf0)
[    3.037325] [<c032a7b4>] (disp_device_set_config) from [<c032a8e0>] (start_work+0x104/0x16c)
[    3.037383] [<c032a8e0>] (start_work) from [<c012f268>] (process_one_work+0x16c/0x20c)
[    3.037439] [<c012f268>] (process_one_work) from [<c012f874>] (worker_thread+0x230/0x2d4)
[    3.037494] [<c012f874>] (worker_thread) from [<c013403c>] (kthread+0x118/0x124)
[    3.037543] [<c013403c>] (kthread) from [<c01010e8>] (ret_from_fork+0x14/0x2c)
[    3.037573] Exception stack(0xc6b4dfb0 to 0xc6b4dff8)
[    3.037610] dfa0:                                     00000000 00000000 00000000 00000000
[    3.037652] dfc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[    3.037691] dfe0: 00000000 00000000 00000000 00000000 00000013 00000000
[    3.037736] ---[ end trace 22d4eeecfee7c9ff ]---
but other than that things looks very similar
But also appear in the log much slower from the very beginning
c
I don't get that, but that's probably because I have the display/LCD driver disabled on my build. The
warn_slowpath_fmt
makes me think it's just upset that some optimization isn't available?
And this is because the old /psci node is remaining and Linux is trying to detect the version. 65535.65535 corresponds to 0xFFFFFFFF which is
PSCI_RET_NOT_SUPPORTED
. I think U-Boot itself is implementing v0.1?
(When U-Boot synthesizes the
/psci
node, it sets the proper version:
[    0.000000] psci: Using PSCI v0.1 Function IDs from DT
)
d
I'll try with it removed then
My new guess speaking of the boot time is that there might be something different with how the SPI NAND is initialized and the read/write operations might be slower
c
That's possible. The NAND is connected (and typically used) in quad-SPI mode. Maybe something with U-Boot is knocking it back down to 1-lane SPI?
d
This is more or less what I think after looking at the NAND chip datasheet
All the events are just slower almost linearly
I mean not really linearly, but more or less the same from the very beginning of the boot process
c
The way to rule out my CPU frequency hypothesis would be to run a very simple benchmark (e.g. something that times how long it takes to do 1 million iterations on a for loop?) and see if there's a speed difference with the two different boot methods.
Here are some benchmarks (I am on U-Boot) for SPI read speed, reading only the first 1MiB partition:
Copy code
# time dd if=/dev/mtd0 of=/dev/zero
...
real    0m 1.15s
...
# time dd if=/dev/mtd0 of=/dev/zero
...
real    0m 1.12s
...
# time dd if=/dev/mtd0 of=/dev/zero
...
real    0m 1.09s
...
d
Original (I actually booted this one to look around):
Copy code
# time dd if=/dev/mtd0 of=/dev/zero
...
real    0m 0.33s
...
# time dd if=/dev/mtd0 of=/dev/zero
...
real    0m 0.33s
...
# time dd if=/dev/mtd0 of=/dev/zero
...
real    0m 0.33s
...
c
Here's more of a CPU benchmark:
Copy code
# time dd if=/dev/urandom of=/dev/zero bs=1024 count=65536
65536+0 records in
65536+0 records out
real    0m 10.39s
user    0m 0.13s
sys     0m 10.24s
d
Copy code
# time dd if=/dev/urandom of=/dev/zero bs=1024 count=65536
65536+0 records in
65536+0 records out
real    0m 2.99s
user    0m 0.06s
sys     0m 2.93s
c
So you're running at about 3.5x the CPU speed of mine
d
Yeah, maybe indeed this is something about the CPU and the NAND read speed just follows because of the slower clocks or something
c
Copy code
# grep BogoMIPS /proc/cpuinfo
BogoMIPS        : 48.00
BogoMIPS        : 48.00
BogoMIPS is determined by a quick at-boot test Linux does where it times how many iterations of a 2- or 3- instruction loop it can do per second
d
Copy code
# cat /proc/cpuinfo
processor       : 0
model name      : ARMv7 Processor rev 5 (v7l)
BogoMIPS        : 48.00
Features        : half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt vfpd32 lpae
CPU implementer : 0x41
CPU architecture: 7
CPU variant     : 0x0
CPU part        : 0xc07
CPU revision    : 5

processor       : 1
model name      : ARMv7 Processor rev 5 (v7l)
BogoMIPS        : 48.00
Features        : half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt vfpd32 lpae
CPU implementer : 0x41
CPU architecture: 7
CPU variant     : 0x0
CPU part        : 0xc07
CPU revision    : 5

Hardware        : Generic DT based system
Revision        : 0000
Serial          : 0000000000000000
I was doing the same test 😄
But this looks the same as in teh boot log, and they're indentical
c
So if BogoMIPS is always the same, it might be dynamically throttling-up and -down 🤔
Since if it was as simple as "the official boot process probably leaves the CPU clocked-up" it should affect BogoMIPS too
d
Another thing might be DRAM timings maybe
I'm thinking of some loop that executes NOPS
c
Good point, we are not using Allwinner's official DRAM initialization.
A loop that just executes NOPS would be a good measure because it should be small enough to fit exclusively in L1 cache
Copy code
# ./bench 
Time to do 1000000000 loops: 7009826087ns
# ./bench 
Time to do 1000000000 loops: 7010556087ns
# ./bench 
Time to do 1000000000 loops: 7005443586ns
# ./bench 
Time to do 1000000000 loops: 7008143920ns
# reboot
...system reboots to official release...
# ./bench 
Time to do 1000000000 loops: 2003211084ns
# ./bench 
Time to do 1000000000 loops: 2003396126ns
# ./bench 
Time to do 1000000000 loops: 2003651126ns
# ./bench 
Time to do 1000000000 loops: 2001892501ns
There's that 3.5x factor again.
My
bench
program
d
So this feels like the CPU is clocked lower then, probably some sort of the dynamic clock which does not boost or is limited for some reason
I think this test almost certainly mean it's not DRAM
Oh and thank you for sharing the build command. This is also something I want to learn more about, so I'm enjoying this additional information
c
I'm something of a Clang zealot, I always use it unless I have to rely on GCC for some reason.
(Or if something's already working just fine with GCC without any effort on my part.)
But yeah, agreed... which is also a shame, because if it was just bad DRAM timings, the fix might actually be pretty simple (once we identify the bad timing). CPU reclocking will require digging around for the missing register we're not hitting.
I'm also wondering if it's not a dynamic thing either. The 48.00 BogoMIPS feels wrong to me: even at the slower clocks, ~7.0ns/loop would mean ~142.9Mloops/sec, and since it's a 3-instruction loop, that's more like ~428.6 BogoMIPS. So it might be that the 48.00 is somehow "hardcoded" not measured, and the problem is that U-Boot is leaving the system at a lower clock rate when it enters Linux, and Linux doesn't know how to reclock.
(Which would mean that the official bootpath is upping the CPU to 100% clock and it stays there the whole time, and adding dynamic frequency scaling to our build of Linux would get us some power efficiency gains. 😎)
d
The measured power usage of the board alone while idling is about 4W 😛 But yeah 😉
I'm not sure where to look in the U-boot, but I might try to look at the docs and find the registers to poke to "unlock" it or at least set the frequency to max to see if this helps
c
We can always poke at registers manually before boot with the
mw
command
So if the user manual says there's a register that can change the core clock, you can try upping it to the 3.5x faster value with
mw
then proceeding with a normal boot (
boot
goes back to the normal boot process as though you hadn't interrupted it during the 2 second timeout)
d
This is good to know. I did not know about this command (as about many more too 😄 ). I'll take a look at this then
c
Official bootpath:
Copy code
=> md.l 02001000 1
02001000: ca002900                               .)..
=>
U-Boot:
Copy code
=> md.l 02001000 1
02001000: fa000b00                             ....
=>
So yeah there's some differences to
PLL_CPU_CTRL_REG
0x29+1
is N=42 and
0x0b+1
is N=12... which is our 3.5x
d
I did not even start looking in the manual and you already found it 🙂
c
24MHz * 12 is 288MHz, while the official bootpath seems to be 1.008GHz
d
So the next question is - can we OC it? 😛
c
Ha you beat me to saying it! Supposedly that PLL is rated as high as 3.0GHz, so yeah OCing is possible. I would love to see a photo of a LN2-cooled BMC 😝
d
Or...
😛
I'm going to play with the registers anyway. Want to learn this, seems super useful in the future too
I think you might know the answer already, but if we can make it be controllable from the OS and the OS can handle controlling the CPU clocks, we might set it to even higher max speeds as long as they're not OC to make it snappier but still clock down when idling or executing less demanding tasks
c
Yeah, Linux has a whole cpufreq subsystem, and with the right driver it can manage the register internally. It can even do
ondemand
frequency scaling so it goes down to this 288MHz when idling and dynamically ramps up to 1.008GHz when under load.
(And speaking as someone who plans on doing a few SDR tasks with my TP2 board, being able to shift the BMC clock up or down a few steps might prove very useful for getting rid of interference.)
d
What does SDR stand for?
t
software defined radio
only SDR I'm doing is ADS-B feeding
c
Picture a soundcard with sampling rates MUCH higher than 48KHz and a tunable frequency-translating circuit put in front of it.
So instead of audio, you're capturing radio waveforms.
(Including high-bandwidth ones like WiFi!)
d
Do you know of a way to read the current CPU frequency from the BMC's Linux?
c
Linux does have
/dev/mem
which is meant to give userspace raw access to the physical memory bus - but as that's obviously a big security hole most builds of Linux usually restrict it.
Let me find the CONFIG_ option real quick for unlocking it
It's
CONFIG_STRICT_DEVMEM
and good news, seems the official firmware has it off.
d
So no re-compilation will be needed
c
So you can just read 4 bytes from offset
0x02001000
of
/dev/mem
to learn the current
PLL_CPU_CTRL_REG
setting
d
I'll play with this a bit to see what I can find and what can be achieved. I feel like you are going to play with this too 🙂
c
I expect some BMC crashing 😉
I don't know if this register guarantees glitch-free switching, so you might end up freezing/crashing the BMC during the switchover.
d
I can also invoke writes from the U-boot and boot the system and then just read
c
(I guess you might need to clear the first bit first, then change the speed, then set the first bit again?)
True. But I doubt the register changes at all between U-Boot time and fully booted Linux.
I don't think the current build of Linux knows about the register.
d
Yeah, this is why I am worried a bit if it can control the clocks if the clocks are "unlocked". I still lack knowledge here too 😄 But these bits of information are a good start for me to try to play with it
What I want to try to do is to read the manual and see what's needed for the clocks to be adjustable by the OS, try to set it and see if Linux can control them and then if putting some load raises the clocks
Hope that makes sense
But yeah, an additional driver might be needed
But, after seeing how this OS is build, I would not be surprised if the support is there already, though
c
The relevant driver that's in charge of the CCU seems to be at:
buildroot/output/build/linux-*/drivers/clk/sunxi-ng/ccu-sun8iw20.c
, and it might already expose the proper knob to adjust, which would mean the devicetree just needs to be updated to tell Linux "this one is for the CPU"
I guess look at the definition at lines 38-48 in that file and see how it's exposed to the rest of the system?
d
I will. I mean I'm going to be away from home for a few days from Thursday and I may or may not be able to play with it in a few coming days (I'm not sure how much you are interested in looking at this)
I'm going back to work shortly but I may look at this today still, I'm just not sure if I'll be able to 🙂
c
My focus is probably going to be on getting mainline U-Boot able to support all of this stuff, and then I might end up making a microSD image for my TP2 boards so I can more easily upgrade the firmware while I keep developing on it.
Though being able to run the CPU at 1.008GHz rather than 288MHz sounds like a pretty necessary step before kicking the Allwinner code out. 🙂
d
Haha, yes 🙂
Just keeping it at this speed is more than enough
An ability to control the clocks dynamically will be only a nice bonus
At least the slow boot cause is now found and known 🙂
c
Yeah - the only missing thing in U-Boot now is the ability to load off of the NAND.
d
Right
c
I asked @svenrademakers if he would be cool with merging down the new U-Boot and temporarily losing NAND boot capability (to be brought back later) and he said that was a bit too drastic of a change to accept, which I guess makes sense.
d
Yeah, I'm not sure that'd be a good thing for the average users who now will be forced to also have an SD card and flash it additionally. Not to mention that the Turing Machines would have to start adding cards to the boards probably
It could, at least, be used by the CE which will only benefit from this (I mean to start)
I already want to re-base the repo with the official firmware and re-do some of the updates I've done to keep the added functionality
c
That would serve as a good proving ground too. Once we have a good solution to set the proper CPU clock, I think this build of U-Boot might be ready for daily use.
That gives us time to catch any other minor bugs and sharp corners before it gets merged to the main firmware.
d
And I'm sure many people will give it a try
I did not have a time to think about it more, yet, but moving the OS onto the SD card is what i wanted from the beginning to give it more space for the other binaries. Also some would become obsolete, like the flashing part that takes RAM (i forgot the name of this system process)
c
That build of U-Boot should be able to read the first partition if it's ext2/ext4/SquashFS formatted, not just FAT. Moving the rootfs (+ kernel zImage) to an ext or SquashFS partition sounds like a good first step.
I do want to get the patch series either merged or cleaned up so this version of U-Boot can be built by buildroot. The whole point of doing this is to get away from mystery binary blobs, after all.
(And redistributing the .bin I shared here to users wouldn't be good for GPL compliance either.)
d
> get away from mystery binary blobs Yeah, that'd be great
Yes, I would love to see the whole image being able to be built by the buildroot (and we can even add your patches if they don't get merged, or before they do)
I think that when I re-base the repo and re-apply some of the changes (like ntp, etc), I'll just branch out and we can try things on the branch while keeping the main branch in-line with the official repo. Then we can merge or do the PR whenever or wherever applicable.
I mean this is my idea, I'm not sure if you like it, though
I'm open for all ideas and suggestions
c
I think Sven is interested in any/all of your downstream improvements being PR'd back into the official repo -- like ntp, for instance. CE might moreso be a fork for the things that are too large to be part of the standard firmware.
d
And this is good to hear, because I want to keep the CE in line with the official repo also to be able to make PRs (but this is only one of a few reasons we've discussed before already)
And then yes, the CE will add an ability to build SC card images, will need a way to easily expand the partition to cover the whole SD card and then have an ability to add stuff that does not fit the official firmware eitehr because the binaries are too big or because they don't fit the direction of the official firmware (like maybe, for example, different GUIs to give a simple example)
c
I wonder how most projects that distribute SD card images handle the expansion problem. Do they just ship with a partition table that assumes, I dunno, 256MiB of available space, but on firstboot it resizes the last partition to take up all available SD card space?
d
This is how the RPi imaging works - you flash the card with the small partition and it expands it on the first boot. I did not check (yet) how exactly this is being done
Nvidia with their Jetson devices has a different approach - they create a partition and transfer files onto it, but this, obviously, is not the direction we want and possibly a challenge to solve if the firmware "want" to have a support for flashing the Jetson modules too
For the BMC I imagine having an image and auto-expansion would the the way
I'm not knowledgeable enough about all the partition types, but I guess maybe updating the partition table will be enough to expand it?
c
I'm going to need to pick your brain a lot on how Jetson flashing works. I understand the CM4 way, at least conceptually: when the CM4 powers on in RPIBOOT mode, it enters a "DFU" mode where it waits for the OS to be uploaded straight into RAM, and the Pi Foundation's
usbboot
tool leverages this to (by default) give it an OS image that just exposes the eMMC as a USB MSDC. And then we can write a full OS to the eMMC.
But it does sound like Jetson doesn't follow this same approach.
d
I never analyzed it fully, but it's downloading the packages to the OS (this is why the host OS must be the same as the OS you're building an image for, with some exceptions), then it builds a bunch of small images for each of the partitions (which there are plenty) and flashes them along with the boot one. It also accesses the on-board QSPI flash (U-boot maybe?) and can tunnel SSH through the exposed USB for the configuration, and the whole process includes a few reboots when the USB device id changes. I only know that briefly, but when the time comes, we can analyze the full flashing log, which is book-sized. Previously, I've been using only the SDK Manager to flash the modules, but since it does not support custom boards (which also has own Flash for the reason I don't fully know yet), I'm using the "by hand" approach when you run a bunch of scripts to do so, nut which also lets me modify some file that disables on-board flash to make it work. But this, like I said, also yields a big log that can be analyzed to understand and document the process.
c
Sounds like Jetsons are nowhere near as simple as CM4s 😬
d
Naaaaah, nowhere near to that. You cannot expose it's storage and
dd
the image
The scripts I mentioned also need the module connected as they're being queried during the build process, but I can possibly image that we could build the images for different modules externally and use them with the BMC (but there will be still some challenges to solve). Either way there are no ready images per each version that could be used like with the RPis
It may also be a thing that the flashing might require some proprietary software, though
(so far there's no way to flash from the ARM devices)
c
> You cannot expose it's storage and
dd
the image Are there hardware/firmware reasons for that (e.g. these are designed for "trusted computing" situations, and the Jetson enforces partitions of the storage that are only accessible depending on the current privilege level)? Or is it just a matter of "NVIDIA is busy making a package manager that suits everyone's use case and never bothered with a
usbboot
equivalent?" I guess what I'm wondering is, if we had the requisite patience and time, could we conceivably write something like
usbboot
but for Jetson nodes, or is that impossible in principle?
d
I guess it's a way to control the whole process within Nvidia's ecosystem. There's no room for tinkerers and community, they deliver things that work. The Jetson modules were never really though to be used by people like we (but there are the Dev Kits liek AGX Orin that you can use for development, but it's a complete box and not just a module) but to be used in the commercial products, so this is part of why
The same way Nvidia's drivers are not open-sourced
Or for the same reason maybe
There are some attempts to open-source the drivers (I was not following it lately), but not for the CUDA and cuDNN part - they will remain closed-source
c
Yeah, and I guess knowing NVIDIA they probably have the Jetson module checking the signature of whatever payload is uploaded in "RPIBOOT" mode, so we can't make our own if we wanted to.
d
To answer your last questions more precisely I'll need to analyze the flashing process more, though
c
I'll have to look into it too I guess. I might also try to get some Wireshark captures of the raw USB traffic.
d
This reminds me of some reverse-engineering I've done to document some protocols 🙂 I'm not an expert in Wireshark, but I used it many times to analyze and reverse-engineer some network protocols. I remember, for example, documenting the way that some controller controls some drone via WiFi and I ended up setting a tunnel using my laptop and wired connection to my workstation to make it easier (I forgot teh details why I had to do it this way)
Oh, I know why - WPA2 😄
Probably
Maybe
Iono
I forgot already
c
Protocol reverse-engineering is always fun. It's a unique feeling of accomplishment when you first make some proprietary black-box thing talk to your own implementation.
d
Yeah, super satisfactory 🙂
I've done it mostly be be able to control it from Python, but the project has never been finished because of some other issues (the drone was made by an Indian company and we could not get the full protocol documentation from them)
c
Hah at that point I'd start thinking about hacking the drone itself to put Arducopter on it
d
The goal was a bit different though - to also let the others to use this solution with the stock firmware, it's a part of the bigger story
c
Ah I see, not a one-off project
d
ML-related
w
I believe it's a bit more complex than that.
For ext4 at least, You need to update the partition table, and then you need to
resize2fs
to modify the superblock or whatever to tell the ext4 driver that it can use the extra space
d
Then we can use the
resize2fs
. Thank you!
w
If your system uses
systemd
then I seem to recall some kind of expand filesystems thing that's built in
idk how Raspberry Pi works, but there's a kernel cmdline parameter that turns on the "expand the root file system on boot" behaviour
d
It's the BMC, so the SystemV, at least for now
w
either way it's conceptually simple, however you do the orchestration. On boot, check if there is free space after the root partition. If so, expand the partition to fill the space and run whatever the filesystem wants you to run to tell it about the extra space
d
Yeah, this part is not a problem. I did not know if we can simply update the partition table or any fs-specific tasks must to be done too
w
right
c
I think there are 3 parts to this: - the capacity of the SD has to be detected and the partition table updated to reflect that - the final partition has to be expanded to consume the newly-discovered free space - the filesystem on that partition has to be resized to use the full partition
d
Indeed, this is what I have in mind
c
One thing I'd like to nudge you in the direction of: consider making the rootfs a SquashFS with a read-write overlay. The advantage to this is SquashFS gets you a fair amount of compression, and it's necessarily read-only so the "fresh install" is kept intact (a "factory reset" consists of just wiping the overlay). You can then just format the overlay on first boot.
(Keeping the golden rootfs intact is one of the reasons I want to push for it on the official firmware too: it makes the "installer" SD pretty simple.)
d
Hmmm, I did not consider SquashFS, I'll have to think about it and also maybe see if it is going to be adopted in the official firmware. I'm not sure if we need a compression (but it'd be useful in the Flash for sure). Fresh install with the CE is not necessarily something people will be using. but maybe? I'm trying to think about the advantages compared to the, for example, ext4 for the CE. I'll definitely play with it, never used it before
c
Yeah, it's often hard to argue with the simplicity of "one single ext4" which is much more appropriate for SD than for flash.
But ext4 is completely untenable on NAND. I'm hoping for SquashFS+UBIFS overlay there 🤞
d
Yeah, I know that the ext4 is not suitable for the flash and why. The process of preparing the SD card image will be separate from preparing the image for the flash, though, so we can use whatever. I'm just trying to find out the prost of using SquashFS
c
The only pro is pretty much the space-saving from compression. It requires that the filesystem be packed ahead of time, so it's necessarily read-only after it's created. I know Buildroot is very friendly to making SquashFS roots, but the operational complexity of needing an overlay can make it not worth the benefit.
d
Also, the read speeds from the SD card can be slower than from the flash? Compression might help, or make them slower, plus, like you mentioned, the complexity adds more to the CPU load too. Something to look at for sure. I guess we can build images for both and simply compare
s
Yes please!
Ive not had the time yet to dive into the low level stuff yet since im on the project, but reading the stuff your doing makes me learn a lot as well. the feature/flash_cm4 branch is almost ready to get PR'ed back to master. Which will bring raspberry CM4 flashing support. (It does not reflect the latest dts improvements https://github.com/turing-machines/BMC-Firmware/pull/49 yet with regards to power management. But i hope to update that shortly after.) I just ordered a jetson, and hope to add support for this one as well. RK1 support will have priority though
j
Maybe add support for f2fs with compression to the BMCs kernel additionally to ext2/3/4, fat and exfat. f2fs is specifivally targeted for flash devices with flash transaction layer like eMMCs and SD cards. MOst Android phones use it nowadays under the hood.
c
Hey @svenrademakers I notice you just bumped the GCC toolchain version from a prebuilt archive to having BR2 take care of it. Heck yeah! Since now seems like a good time to do this, how about also moving from glibc to musl? It's generally regarded as a better choice in memory-constrained environments like ours, it's a lot less picky about version mismatches, and it's also well-supported by Rust. Any thoughts from you or others here?
s
I think musl also has a better reputation when it comes to security and the turnaround times to fix them.. Im open to it, but also currently i dont see any direct need to step over? Maybe someone can convince me otherwise. Im thinking, a good idea could be to make a build and just compare/ testdrive it Btw, Musl support in buildroot is experimental if i recall correctly.
c
I'm not seeing anything saying musl is experimental -- though I suppose a few packages might not build without some patching. And yeah, it's not that glibc is a bad choice. My inner monologue's reasoning is something like "musl offers a bunch of advantages over glibc, changing over would require that people installing out-of-tree packages/binaries on their BMCs update their own toolchain, but since we're changing the toolchain now anyway we may as well combine it with this other change before time passes and people start to rely on glibc"
This question had me wondering so I decided to investigate: the highest clock rate I can boot my BMC at without it crashing on boot is 1416MHz, which is an impressive ~40% increase over the stock setting (1008MHz)! The commands to add to the top of your boot script, if you feel so inclined:
Copy code
mw 02001000 ca003a00
sleep 1
No idea if it's actually stable after boot since I haven't done much more than SSH in and run my
bench
program a few times. I wouldn't necessarily recommend any user seriously rely on this - I just did it for fun. 🙂
d
Finding the max stable clock is something I want to investigate too, as well as the generated heat. Clocking it even higher might require higher core voltage, but I'm also not sure (yet) what else depends on this clock so the instability or crashing might be not necessarily the core speed. Of course OC should not be used in production, more like a curious experiment. I can suspect someone might want to clock it higher, but that definitely will void the warranty and I'm going to point this out every time 🙂
c
And if the goal is more speed, you can only do so much with core clock, eventually one needs to start digging into DRAM clock/timings.
d
Indeed. OC-in DRAM can give you a nice performance boost too
c
I'm probably going to standardize at around 1.2GHz for my own uses; a 20% OC is nothing to scoff at but I'm not pushing the envelope too much either. 😄
d
Yeah, this is super reasonable and also a value I had in my mind as a possibly still super stable
I also guess I'll experiment with this on the MangoPis - much cheaper in case of the magic smoke or any other catastrophic failure 😄
c
I'm hoping no magic smoke given I'm not upping any voltages, but it's impossible to guarantee against so 🤷‍♂️
d
😄
c
btw U-Boot is already computing deterministic MAC addresses for us from the T113 serial number, and it's storing that address in the
ethaddr
variable in its own env
All we have to do is figure out how to communicate it to the kernel (either how it wants that added to the cmdline or, better yet, modified into the fdt) and we get deterministic
02:
MAC addresses without needing to figure out how to do it post-boot
d
Indeed:
Copy code
=> echo ${ethaddr}
02:00:c3:c3:e0:f3
In the env.cfg in the official firmware I've seen the
mac_addr
parameter in the bootargs so I tried to add
mac_addr=${ethaddr}
and validated later under Linux with
cat /proc/cmdline
, but this seems to have no effect anywhere (I'm expecting it's being used to set it in the device tree, but possibly not with U-boot?). So, I read about how you usually pass the MAC address and it seems like the way to do that is to set it in the dtb. There are 2 properties: 1.
local-mac-address
which is the physical address of the ethernet interface 2.
mac-address
which is the last used MAC (I did not dig deep enough what exactly this is) In linux, I checked:
Copy code
# cat /proc/device-tree/aliases/gmac0
/soc@3000000/eth@4500000#
And then `ls -l /sys/firmware/devicetree/base/soc@3000000/eth@4500000/`but neither
mac-address
not
local-mac-address
are set. From what I read, U-boot should set
local-mac-address
in dtb from the
etheraddr
variable So I checked what happens if I add these by hand:
Copy code
=> fdt resize 128
=> fdt set /soc@3000000/eth@4500000 local-mac-address ${ethaddr}
=> fdt set /soc@3000000/eth@4500000 mac-address ${ethaddr}
And now under Linux: ``` # cat /sys/firmware/devicetree/base/soc@3000000/eth@4500000/local-mac-address 02:00:c3:c3:e0:f3 # cat /sys/firmware/devicetree/base/soc@3000000/eth@4500000/mac-address 02:00:c3:c3:e0:f3 ```but the MAC is not being set. This means 2 things in my opinion: 1. U-boot should set this property, most likely when you invoke the board-specific setup (an equivalent of
fdt boardsetup
but after you run
bootz
2. The ethernet drivers do not use this to set the MAC and it might be beneficial to look into the driver code to see if the drivers are trying to read the MAC from anywhere
Speaking of the
fdt boardsetup
, since you mentioned you don't know what exactly is it doing - Checked and the only change is (left - before, right - after invoking it):
So this is mostly things related to USB (most likely important) and audio (should be removed?)
I mean there might be some other modifications as well that U-boot is performing after you invoke bootz, but these are the ones I can see from U-boot
I started digging from the driver and This is what I found so far:
Copy code
# cat /sys/class/net/eth0/addr_assign_type
0
Where:
Copy code
What:        /sys/class/net/<iface>/addr_assign_type
Date:        July 2010
KernelVersion:    3.2
Contact:    netdev@vger.kernel.org
Description:
        Indicates the address assignment type. Possible values are:
        0: permanent address
        1: randomly generated
        2: stolen from another device
        3: set using dev_set_mac_address
Then:
Copy code
# cat /sys/class/net/eth0/address
e6:0e:b1:15:6e:d3
And this is indeed the address I can in
ip a
. So now I'll try to find how it is being set
After digging more into the code, I found in the
net/core/dev.c
->
int register_netdevice(struct net_device *dev)
this:
Copy code
c
    add_device_randomness(dev->dev_addr, dev->addr_len);

    /* If the device has permanent device address, driver should
     * set dev_addr and also addr_assign_type should be set to
     * NET_ADDR_PERM (default value).
     */
    if (dev->addr_assign_type == NET_ADDR_PERM)
        memcpy(dev->perm_addr, dev->dev_addr, dev->addr_len);
and
dev_addr
is not being set by the
sunxi-gmac
, so it'll require patching the driver? If the
dev_addr
is set by the driver, it'll be used by the network subsystem. At least this is what I understand from the code. An alternative would be to set the mac using
/etc/network/if-up.d
(
/etc/network.if-pre-up.d
?) after reading it from the device tree (
/sys/firmware/devicetree/base/soc@3000000/eth@4500000/local-mac-address
). We can add a script to the filesystem overlay in the buildroot. After adding MAC to the device tree, I tested this by hand: ``` ip link set dev eth0 down ip link set dev eth0 address $(cat /sys/firmware/devicetree/base/soc@3000000/eth@4500000/local-mac-address) ip link set dev eth0 up ```and it works like a charm. What do you think?
Also, for the purpose of leaving notes:
Copy code
setenv bootargs earlycon=uart8250,mmio32,0x02500c00 console=ttyS3,115200 loglevel=8 aw-ubi-spinand.ubootblks=24 ubi.mtd=sys root=ubi0_5 rootfstype=ubifs,rw init=/sbin/init partitions=mbr@ubi0_0:boot-resource@ubi0_1:env@ubi0_2:env- redund@ubi0_3:boot@ubi0_4:rootfs@ubi0_5:recovery@ubi0_6:dsp0@ubi0_7:private@ubi0_8:UDISK@ubi0_9: cma=8M
load mmc 0 ${kernel_addr_r} zImage
load mmc 0 ${fdt_addr_r} sun8iw20p1-t113-100ask-t113-pro.dtb
fdt addr ${fdt_addr_r}
fdt rm /psci
fdt rm /firmware/optee
fdt resize 128
fdt set /soc@3000000/eth@4500000 mac-address ${ethaddr}
fdt set /soc@3000000/eth@4500000 local-mac-address ${ethaddr}
#mw 02001000 ca003a00 #OC
mw 02001000 ca002900
sleep 1
bootz ${kernel_addr_r} - ${fdt_addr_r}
c
I'm suspecting that Linux wants
mac-address
(or
local-mac-address
, I don't know the difference) in 6-byte binary not :-separated-ASCII-hex form
d
I'm an idiot, indeed
I learned so much today and I must have missed the format which should look more like this:
Copy code
local-mac-address = [00 0a 35 00 00 02];
I'll format it correctly and test tomorrow
c
👍
I'm hoping there's an easy way to have U-Boot set the binary format straight from the ASCII var. Maybe whatever function parses the hex can treat
:
as whitespace? 🤔
I also notice you kept the cma=8M, does it increase MemTotal when you reduce CMA like that? (I observed that my U-Boot setup gives me slightly more MemTotal than the official bootpath, and I'm thinking it's because I'm not reserving RAM for OP-TEE anymore.)
d
There's only one way to find out 🙂 If not, I'll figure it out
Yes, it reserves 16MB otherwise and the firmware only uses like 1-ish MB iirc
c
Does Linux treat CMA as separate from MemTotal? I'm guessing it does.
d
The consoleblank thing is leftover after some tests - forgot to remove it
c
MemTotal:         113796 kB
with no
cma=
in my cmdline
d
I do not recall it lowers the total amount of memory, but it;s reserved
So it looks like it does
c
Ah you have ~8MB more MemTotal than that?
d
I'm pretty sure we can use CMA even lower if we need more available RAM
I would have to check this to be 100% sure (can't do that now, packing up an leaving shortly)
c
I'm just glad that this new bootpath is saving at least 8MiB more RAM 😄
d
But I checked that the firmware does not use more than 2MB so there's no reason to waste this RAM 🙂
c
I think the only thing that actually needs to stay resident is the little stub of PSCI code that's uh...
checks
8KiB in this build
My news in U-Boot land: - I found the correct build-time config option to have the SPL set the CPU clock rate, and have updated that to 1008 MHz to match what Allwinner's bootloader does. It's easy to change it, and we could even make it higher (with the blessing of TMs, of course) before it's merged to the official firmware. - I discovered my compiler was not generating Thumb code and fixed that. This is a big relief as I was starting to panic about running out of space in the SPL (there's a 32KiB soft limit). - I got the USB gadget driver working. I'm currently figuring out how hard it would be to make an open source PhoenixSuit replacement with this. - I'm currently looking into getting the SPL to load U-Boot from the NAND, which is the last major feature we need before this is viable. - I've mentally worked out a flowchart for the ideal boot flow that U-Boot should follow, which provides a few (5, by my count) different recovery options for users who have locked themselves out or otherwise bricked their BMC with a bad flash.
That last one I should probably diagram up and share since others might have good input that I'd otherwise miss if I just went ahead and implemented it 😅
d
Nice!
And yes for the diagram 🙂
Whenever you have a new image, I could start using it instead of what I am using now 🙂
I forgot to make a MicroSD card reader with me so I'll need to see if I can write a card using TPi2 itself (BMC) or maybe use an Orin Nano flashed onto the M.2 NVMe (my Nano comes from devkit so it has an SD card slot)
Ok, I tested it by setting MAC by hand using the correct format and it worked:
Copy code
fdt set /soc@3000000/eth@4500000 mac-address [12 34 56 78 9A BC]
fdt set /soc@3000000/eth@4500000 local-mac-address [12 34 56 78 9A BC]
So we have a way. Now to set it from the genrated mac correctly
c
It looks like the sunxi
ft_board_setup
function calls
fdt_fixup_ethernet
which sets
mac-address
and
local-mac-address
if there is an alias for
ethernet0
fdt set /aliases ethernet0 "/soc@3000000/eth@4500000"
+ boardsetup seems to fill in
local-mac-address
correctly; wonder if we should just update the .DTS to have this alias?
d
I was looking into settings this (first wanted to check the code, though, to see if I can see anything else there)
I think it won't hurt in any way
And will make it easier since I don;t have to parse the string to set it. I started looking at the
setexpr
before you replied since something like
${var:0:2}
does not work in U-boot
c
Ah, yeah, the U-Boot shell is only sh-inspired, it's not fully bashlike 😔
d
Also,
fdt
did not let me to do this:
Copy code
=> fdt set /soc@3000000/eth@4500000 mac-address [${ethaddr}]
Unexpected character ':'
😄
Which makes sense, of course
I tried your alias and the eth0 still has this test MAC I used previously, hmmm
c
I tried the same and got the correct address in
local-mac-address
but a random address in
/sys/class/net/eth0/address
Is this only respecting
mac-address
and ignoring the local one?
fdt_fixup_ethernet
will create
local-mac-address
but will only set
mac-address
if it's already present
d
mac-address
is the last used MAC, whatever this means, while
local-mac-address
is the physical address
I understand this like
local-mac-address
is the physical one and
mac-address
can be used to set own (and has a priority)
Testing
Tried again and still see the previously set testing MAC
I'll see with the
mac-address
present now
c
Whoa, yeah, it did the same thing with my test just now.
d
Ok, so this is not enough:
Copy code
fdt set /soc@3000000/eth@4500000 mac-address []
For the
fdt boardsetup
to set it, there must be a MAC already present
c
Odd, I have the same thing and it did work for me
d
I first tested with the OS (without
fdt boardsetup
) and then tested in U-boot
Hmm, I'm going to re-test
Yeah, I goofed
It works
c
For me it seems like whether
mac-address
is present or not it uses a random assignment derived from
local-mac-address
d
Did you call
fdt boardsetup
? I assumed it's being invoked when you run
bootz
I'm asking because my MAC is not being updated
It's no longer random too
c
Yeah it runs implicitly
d
Copy code
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast qlen 1000
    link/ether 12:34:56:78:9a:bc brd ff:ff:ff:ff:ff:ff
c
I've been confirming with
xxd /sys/firmware/devicetree/base/soc@3000000/eth@4500000/local-mac-address
d
It's still this
c
Oh that's very weird. For me it randomized
4a:13:e4:f9:79:75
and has been sticking with it.
I'm predicting it's actually another one of those "memory isn't being initialized so it's using the value left over from last boot" situations
Perhaps due to a driver bug it isn't respecting
local-mac-address
, but that IS enough to tell it not to randomize one
d
Copy code
# xxd /sys/firmware/devicetree/base/soc@3000000/eth@4500000/local-mac-address
00000000: 0200 fadb 6917                           ....i.
# xxd /sys/firmware/devicetree/base/soc@3000000/eth@4500000/mac-address
00000000: 0200 fadb 6917                           ....i.
# ip a
[...]
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast qlen 1000
    link/ether 12:34:56:78:9a:bc brd ff:ff:ff:ff:ff:ff
[...]
So yes, it read this once and never again
c
What if you take the
ethernet0
alias back out and go back to setting (only)
local-mac-address
by hand?
d
Also:
Copy code
# cat /sys/class/net/eth0/addr_assign_type
3
and I don't think this is correct
Shoud be 0
c
It's 0 for me, very weird
d
It was 0 before
c
No chance there's a forgotten hwaddress directive in /etc/network/interfaces?
d
3 means it's being set in the OS if I'm not wrong
Hmmm
LOL
OK, I used the other board this time and I forgot I have this set
LOL
LOL
😄
c
Well, this is why we do these things in pairs I guess, it's a nice control for the forgotten-about settings 🙂
d
Yeah
I completely forgot
This is a leftover from a long time ago when I tested static MACs
This is why I though this: https://discord.com/channels/754950670175436841/1080282784570019942/1114325672177971200 worked, since I used the same MAC as was set here. I just happened to connect my other board now to test this
c
So it's setting some unusual address in response to
local-mac-address
being present?
d
I'm testing again now
c
Because for me, it's not respecting that property, but it is setting a consistent address
d
Looks like it's indeed not respecting it and is still setting a random MAC
Yeah, a random MAC every boot still
c
I'm still getting random-but-consistent MAC
Trying to see how much I need to pare down my config before it goes back to true random
d
What's the "consistent" part of it?
2 last I saved:
Copy code
26:96:8b:49:49:74
ae:43:61:ac:1a:48
c
It is always assigning
4A:13:E4:F9:79:75
(no it's not similar to my U-Boot generated
ethaddr
at all)
d
No chance there's a forgotten hwaddress directive in /etc/network/interfaces?
😛
c
Aha I checked! (Also I always use
02:00:
for my stuff)
d
So where is it coming from, then?
c
My current theory is it's hashing
local-mac-address
but barring that, no idea
I'm going to try randomizing it on the next boot
d
It does not do that in my case, I checked that
c
OK I just pulled everything back out and it's still assigning
4A:13:E4:F9:79:75
(assign type
0
), what gives
But this does mean it's ignoring
local-mac-address
entirely
d
Can you try the U-boot image you gave me last time?
c
Maybe it's trying to randomize it and just always happening before the entropy source is up
I'm thinking it might be my kernel image
d
May be as well
I'm using stopck firmware for now (almost, it's CE 0.1.0, bu same for this part as the official)
c
But still it's assign type 0, and 1 is for randomized 🤔
d
It always randomizes a MAC and sets given one only if it exists?
I'm not sure what is it copying there then
I did not track it this much
c
So what, we're thinking uninitialized memory?
d
Possibly
Or it sets something there in some way
c
I can try going back to U-Boot, having it zero out RAM, then boot again...
Copy code
40000000-47ffffff : System RAM
  40008000-407fffff : Kernel code
  40900000-4095b7e7 : Kernel data
Easy enough to wipe those physical ranges and then try a fresh boot
d
Or disconnect the board from power for a bit
c
I'll try that next if this doesn't work
(Too lazy to go to the room my board is in)
d
But then, it is random for me every time, I have 0 for this type and did not disconnect the board
c
Just did
mw.b 40008000 0 a00000
to see about maybe zeroing RAM
Still
4A:13:E4:F9:79:75
wow
d
I'm tempted (though also lazy) to try this in the other board I have to see what I get
c
I don't have
Use random mac address
in my kernel log at least
d
My other board also yields random MACs
c
Ha, I think I might know what's going on. I updated my kernel config to add some crypto modules (e.g. MD5) to the kernel image so they aren't loaded later in boot
Which I think means I have inadvertently "fixed"
drivers/net/ethernet/allwinner/sunxi-gmac.c:geth_chip_hwaddr
d
Copy code
[    3.179703] eth0: Use random mac address
c
Added this to my bootargs:
sunxi_gmac.mac_str=${ethaddr}
Behaves exactly as expected
Really hating the current situation though, but it also means that PR #50 will, though not intended at all, get people on a stable MAC address (cc @svenrademakers)
(I don't really like that the firmware's stable MAC differs from U-Boot's stable MAC -- would like to unify those two so people aren't caught off guard when recovery ops use a different address)
d
Works indeed. Before, I've been trying
mac_addr
but not
sunxi_gmac.mac_str
. Incorrectly as can be seen
What's wrong with this solution?
c
I do like the
sunxi_gmac.mac_str=${ethaddr}
solution, but I'm a little bothered to learn that my PR #50 changes will inadvertently land us on one solution for generating a stable MAC that may differ from our final one. On one hand, it's nice to have stable MACs earlier (i.e. before we get the U-Boot stuff stabilized), but on the other hand I worry about people growing attached to the MD5-generated one and getting upset when U-Boot does something else.
If push comes to shove, I can always port the Linux driver's MD5-based MAC generation algorithm back over to U-Boot's board/sunxi.c, so I guess it's not too big of a deal
d
Oh, ok, I get it now
So, currently people get random MACs, with the PR #50 they start getting static MAC (but different for each board?) until/unless either we switch or U-boot or you port this code?
c
Yeah. There are two different algorithms for "static MAC but different for each board." One was just broken, the other is in U-Boot.
d
So I don't think this is bad. Currently people have random MACs that are changing and they have to assign one by hand. After this PR they'll have a random MAC that is not changing, so this is a benefit at this stage.
I don't think this is going to be much of a problem if the MAC changes again later
Since I'm always putting this, currently:
Copy code
setenv bootargs earlycon=uart8250,mmio32,0x02500c00 console=ttyS3,115200 loglevel=8 aw-ubi-spinand.ubootblks=24 ubi.mtd=sys root=ubi0_5 rootfstype=ubifs,rw init=/sbin/init partitions=mbr@ubi0_0:boot-resource@ubi0_1:env@ubi0_2:env- redund@ubi0_3:boot@ubi0_4:rootfs@ubi0_5:recovery@ubi0_6:dsp0@ubi0_7:private@ubi0_8:UDISK@ubi0_9: cma=8M sunxi_gmac.mac_str=${ethaddr}
load mmc 0 ${kernel_addr_r} zImage
load mmc 0 ${fdt_addr_r} sun8iw20p1-t113-100ask-t113-pro.dtb
fdt addr ${fdt_addr_r}
fdt rm /psci
fdt rm /firmware/optee
fdt set /soc@3000000/eth@4500000 mac-address []
fdt set /aliases ethernet0 "/soc@3000000/eth@4500000"
mw 02001000 ca003a00 #OC
mw 02001000 ca002900
sleep 1
bootz ${kernel_addr_r} - ${fdt_addr_r}
but I'm unsure if we still need or even want to set
mac-address
and
static-mac-address
since we're not using this
c
And if someone does get attached and doesn't want another change when U-Boot stabilizes, I could just port the algorithm (or we encourage them to set theirs as fixed before the change lands)
d
Indeed. The firmware is in development and things are going to be changing. I understand some changes like this may be inconvenient, though
Yet another thing that could be done is to use the persistent storage with some variable that will define which MAC to use
If the version of the firmware before this big change provide a tool to set this and the new firmware will look for the variable and utilize it (super raw idea, though, because there's a question of how), this could be added to the firmware release info
c
The ideal would be if TMs had an OUI assignment from IEEE and put permanent MAC addresses in the EEPROM at the factory.
d
Or there could be a tool in the previous firmware to generate a MAC and set it in the persistent storage and all the following firmware versions might use it. This way people will get a static MAC before all these changes and we won't have to rely on the U-boot's MAC generator at all
This could also be used to set a user MAC (a MAC defined by the user)
c
Whatever it is should be consistent with U-Boot's own network stack though, in case people want to PXE boot their BMC or need to do recovery over Ethernet.
But keeping it in EEPROM and having U-Boot pull from there should work fine
d
Sure. I don't think that'd be a problem to create such tool
c
Ditto if it's configured in the U-Boot env
d
What's more, the script I mentioned could be even added into the
/etc/init.d
to automatically run and set the MAC in the EEPROM if it's absent
No user action would be required and the MAC will be consistent pre and post U-boot
I am looking for an ability to dynamically change the CPU frequency. The CPU supports
DVFS
(dynamic voltage and frequency scaling) and I checked:
Copy code
# ls -l /sys/firmware/devicetree/base/cpu-opp-table/
total 0
-r--r--r--    1 root     root            34 Jan  1  1970 compatible
-r--r--r--    1 root     root            14 Jan  1  1970 name
-r--r--r--    1 root     root             6 Jan  1  1970 nvmem-cell-names
-r--r--r--    1 root     root             4 Jan  1  1970 nvmem-cells
-r--r--r--    1 root     root             0 Jan  1  1970 opp-shared
drwxr-xr-x    2 root     root             0 Jan  1  1970 opp@1008000000
drwxr-xr-x    2 root     root             0 Jan  1  1970 opp@1104000000
drwxr-xr-x    2 root     root             0 Jan  1  1970 opp@480000000
drwxr-xr-x    2 root     root             0 Jan  1  1970 opp@720000000
drwxr-xr-x    2 root     root             0 Jan  1  1970 opp@912000000
-r--r--r--    1 root     root             4 Jan  1  1970 phandle
So the OOP tables exist but the
cpufreq
is absent in the
devfs
. I added
cpupower
tool to the firmware (compiled the firmware and flashed my board) to confirm this. Now, I don't see a way to add
cpufreq
and a governer. I'm also not yet sure if the driver supports this.
Hmm, or maybe there is a way. Recompiling the kernel 🙂
So, I added this: ``` # # CPU Power Management # # # CPU Frequency scaling # CONFIG_CPU_FREQ=y CONFIG_CPU_FREQ_GOV_ATTR_SET=y CONFIG_CPU_FREQ_GOV_COMMON=y CONFIG_CPU_FREQ_STAT=y # CONFIG_CPU_FREQ_TIMES is not set CONFIG_CPU_FREQ_DEFAULT_GOV_PERFORMANCE=y # CONFIG_CPU_FREQ_DEFAULT_GOV_POWERSAVE is not set # CONFIG_CPU_FREQ_DEFAULT_GOV_USERSPACE is not set # CONFIG_CPU_FREQ_DEFAULT_GOV_ONDEMAND is not set # CONFIG_CPU_FREQ_DEFAULT_GOV_CONSERVATIVE is not set # CONFIG_CPU_FREQ_DEFAULT_GOV_SCHEDUTIL is not set CONFIG_CPU_FREQ_GOV_PERFORMANCE=y # CONFIG_CPU_FREQ_GOV_POWERSAVE is not set CONFIG_CPU_FREQ_GOV_USERSPACE=y CONFIG_CPU_FREQ_GOV_ONDEMAND=y # CONFIG_CPU_FREQ_GOV_CONSERVATIVE is not set # CONFIG_CPU_FREQ_GOV_SCHEDUTIL is not set # # CPU frequency scaling drivers # # CONFIG_CPUFREQ_DT is not set # CONFIG_CPUFREQ_DUMMY is not set CONFIG_ARM_ALLWINNER_SUN50I_CPUFREQ_NVMEM=y # CONFIG_ARM_BIG_LITTLE_CPUFREQ is not set # CONFIG_QORIQ_CPUFREQ is not set # end of CPU Frequency scaling # # CPU Idle # CONFIG_CPU_IDLE=y CONFIG_CPU_IDLE_MULTIPLE_DRIVERS=y CONFIG_CPU_IDLE_GOV_LADDER=y CONFIG_CPU_IDLE_GOV_MENU=y # CONFIG_CPU_IDLE_GOV_TEO is not set CONFIG_DT_IDLE_STATES=y # # ARM CPU Idle Drivers # # CONFIG_ARM_CPUIDLE is not set CONFIG_ARM_PSCI_CPUIDLE=y # CONFIG_ARM_HIGHBANK_CPUIDLE is not set # end of ARM CPU Idle Drivers # end of CPU Idle # end of CPU Power Management ```but this didn't work
c
I have no idea what OPP is, will need to research. At first glance, looks like a system for defining points on a curve that maps desired freq -> settings?
d
They contain cpu frequency, voltage, etc, like a profiles available to be used
I think it's the drivers which does not support this
c
I did delete a node from the original devicetree describing a voltage regulator for the CPU core, but it was PWM on a nonexistent GPIO pin
I still don't know if that was included by mistake or if the nonexistent pin was actually internal and drove an internal on-die regulator 🤷‍♂️
d
I checked this change with the fully-official firmware first
c
Stuff with voltage scares me... more potential for magic smoke or electromigration or other permanent damage, so I don't really want to experiment with it unless I'm 100% sure that it's what the vendor recommends
If we start working on voltage scaling support, we'll probably want to use your sacrificial MangoPis for this 😛
d
Yes, most likely 🙂 I have a couple 🙂
So I can make a mistake twice 😉
But these tables come from the dtb, so I guess they're correct
Also the max official frequency is a bit higher than the current fixed one
c
Yeah - my bigger fear is we write a voltage to the wrong register or we write it incorrectly or something
d
1104000000
vs
1008000000
Yeah, I understand. I'll try to look at the driver again and at the other drivers and see if I find anything. This is all super new to me 😄
c
It's fun though to peer behind the curtain and know exactly what the system is doing at the absolute low level though, isn't it? 😄
d
Oh you're damn right 😄
c
I'm actually pretty new to writing bootloader support and kernel drivers and things like that, but I've reverse-engineered many, many embedded systems like this, and the thrill is the same
d
For the drivers, the only options are:
Copy code
#
# CPU frequency scaling drivers
#
# CONFIG_CPUFREQ_DT is not set
# CONFIG_CPUFREQ_DUMMY is not set
CONFIG_ARM_ALLWINNER_SUN50I_CPUFREQ_NVMEM=y
# CONFIG_ARM_BIG_LITTLE_CPUFREQ is not set
# CONFIG_QORIQ_CPUFREQ is not set
# end of CPU Frequency scaling
and
SUN50I
most likely just does not work here
c
Most of the register layouts we find in the T113 are very similar to those found in the (RISC-V-based) D1 and the sun50i.
e.g. the clock control block is basically a copy-paste of the one Allwinner used in the sun50i.
Here's the function U-Boot uses to adjust boot clocks on the T113 (it's the same driver used for the sun50i):
(This is a little better than "just poke a new clock speed at `02001000`" because it takes the CPU off the PLL during the change, which is smart because the PLL is most likely not glitch-free)
d
This actually cleared some confusion I had. They mention parameter called P in the formula bu there's no related register. But this code uses parameter called M and this one has a register
You only need to take it out of PLL if clocking down.
c
Oh interesting, so clocking-up is (generally) glitch-free but clocking down might cause glitches?
Is this something you found experimentally or did you read the user manual more carefully than I did? 😉
d
I'm actually going to re-check this. From manual, but maybe my memory does not work well at this stage 😄
Copy code
PLL_CPUX supports dynamic frequency adjustment (modifying the value of N). However, for the system
stability, to configure the frequency of the PLL_CPUX from a higher value to a lower one, switch the clock
source of the CPU to another clock whose frequency is not higher than the current one first, and configure
PLL_CPUX to the target low frequency, and then switch the clock source of the CPU back to PLL_CPUX
Copy code
./buildroot/output/build/linux-5112fdd843715f1615703ca5ce2a06c1abe5f9ee/arch/arm/configs/sun8iw20p1smp_t113_auto_defconfig:46:CONFIG_ARM_ALLWINNER_SUN50I_CPUFREQ_NVMEM=y
so the driver should be fine
c
It does sound correct to me; we might need to look at what the driver wants in order to do its job, but I guess try building with it enabled and see what happens?
c
Oh, dang, didn't see that the sun50i was already enabled haha
So it might have either added another
compatible = "..."
that it could've supported (which wasn't present in the fdt) or it tried to start up but some config or related drivers were missing and it complained in dmesg.
Time to read
drivers/cpufreq/sun50i-cpufreq-nvmem.c
I guess
d
Yes, I want to see if the registers used are correct
But then, even if the registers were wrong, something should change
I indeed did not check the boot log yet
Maybe:
Copy code
[    0.006008] /cpus/cpu@0 missing clock-frequency property
[    0.006035] /cpus/cpu@1 missing clock-frequency property
c
So the init for this driver: 1. Confirms that the root node lists a suitable string (one is
"arm,sun20iw1p1"
) in its
compatible
property (I think it does already?) 2. It registers a driver under
/sys/bus/platform/drivers
called (apparently)
sun50i-cpufreq-nvmem
3. It tries to register a platform device of the same name, under
/sys/bus/platform/devices
d
Copy code
c
static const struct of_device_id sun50i_cpufreq_match_list[] = {
    { .compatible = "arm,sun50iw9p1", .data = &sun50iw9_soc_data, },
    { .compatible = "arm,sun50iw10p1", .data = &sun50iw10_soc_data, },
    { .compatible = "arm,sun8iw20p1", .data = &sun8iw20_soc_data, },
    { .compatible = "arm,sun20iw1p1", .data = &sun20iw1_soc_data, },
    {}
};
c
Ah okay we have
arm,sun8iw20p1
d
This driver depends on
NVMEM
which is enabled
I guess this means more than I see (since you mentioned it)
c
Do you see the entry under
/sys/bus/platform/drivers
?
Yeah the driver's init wants to see one of the entries in this table in
/sys/firmware/devicetree/base/compatible
d
I'm not sure which entry do you mean:
Copy code
MBUS_PMU              reg-dummy             sunxi-gmac
alarmtimer            reg-fixed-voltage     sunxi-gpadc
allwinner,sunxi-hdmi  simple-reset          sunxi-iommu
disp                  spi                   sunxi-keyboard
dump_reg              ss                    sunxi-mmc
eeprom-sunxi-sid      sun4i-ts              sunxi-ohci
gpio-clk              sun6i-dma             sunxi-rc-recv
i2c                   sun6i-prcm            sunxi-rtc
of_fixed_clk          sun8iw20-ccu          sunxi-rtc-ccu
of_fixed_factor_clk   sun8iw20-pinctrl      sunxi-sram
otg manager           sun8iw20-r-ccu        sunxi-wdt
pwm-regulator         sun8iw20-rtc-ccu      sunxi_pwm
pwrseq_emmc           sunxi-cedar           sunxi_usb_udc
pwrseq_simple         sunxi-ehci            uart
c
Supposedly it should be
sun50i-cpufreq-nvmem
But I'm not seeing it, so it looks like the driver is refusing to init
d
Ah, right. I'm a bit tired and not necessarily understood you question initially 🙂
c
Well that makes two of us who are tired haha
None of the code in this driver looks like it's related to controlling the CPU frequency, it looks like its only job is to grab the operating points table out of efuses?
d
Copy code
For some SoCs, the CPU frequency subset and voltage value of each OPP
varies based on the silicon variant in use. Allwinner Process Voltage
Scaling Tables defines the voltage and frequency value based on the
speedbin blown in the efuse combination. The sun50i-cpufreq-nvmem driver
reads the efuse value from the SoC to provide the OPP framework with
required information.
It also says (just putting this here in case you see something I did not):
Copy code
Required properties:
--------------------
In 'cpus' nodes:
- operating-points-v2: Phandle to the operating-points-v2 table to use.

In 'operating-points-v2' table:
- compatible: Should be
    - 'allwinner,sun50i-h6-operating-points'.
- nvmem-cells: A phandle pointing to a nvmem-cells node representing the
        efuse registers that has information about the speedbin
        that is used to select the right frequency/voltage value
        pair. Please refer the for nvmem-cells bindings
        Documentation/devicetree/bindings/nvmem/nvmem.txt and
        also examples below.

In every OPP node:
- opp-microvolt-<name>: Voltage in micro Volts.
            At runtime, the platform can pick a <name> and
            matching opp-microvolt-<name> property.
            [See: opp.txt]
            HW:        <name>:
            sun50i-h6    speed0 speed1 speed2
c
I read that to mean, "the necessary voltage to hit each frequency is determined at the time each T113 is manufactured and burned into the efuses, so you need to read it per-chip"
d
This too, but I mostly mean it's a module that cpufreq will load and use and that it does not set anything directly. But i's called a frequency scaling driver which confuses me a little
c
Looks like all you really need is a
clocks = <...>;
property on the dt nodes for the CPU cores, and the operating points table is for Linux to set suitable voltages when the clocks are changed.
d
I'm pretty sure I've seen them there
c
See e.g.
d
I mean the clocks
c
Ah yep
d
Copy code
=> fdt list /cpus/cpu@0
cpu@0 {
        device_type = "cpu";
        compatible = "arm,cortex-a7", "arm,armv7";
        reg = <0x00000000>;
        enable-method = "psci";
        clocks = <0x00000002 0x00000016>;
        dynamic-power-coefficient = <0x0000009c>;
        cpu-idle-states = <0x00000003 0x00000004>;
        operating-points-v2 = <0x00000005>;
        #cooling-cells = <0x00000002>;
        cpu-supply = <0x00000006>;
        phandle = <0x0000000b>;
};
c
I mean this looks correct to me, I wonder if there's some generic CPU reclocking driver that's just not enabled?
d
This link is the same as the one above
c
Oh oops:
d
Yeah, I'm wondering this too
c
Seems to me that either the CCU driver doesn't know how to handle reclocking requests, or Linux is not configured to generate those requests 🤷‍♂️
d
The
cpufreq
is missing in the
/sys/devices/system/cpu
so something is missing for sure
Or maybe not for sure, but there's also no error in the boot log that I see
I added
cpufeq
to the kernel config and have seen it compiling
c
By that you mean
CONFIG_CPU_FREQ
?
d
Yes
c
Yeah come to think of it that's a requirement for the sun50i_nvmem so
d
Yes, as much as I understand there is a generic driver and then specific ones like
sun50i-cpufreq-nvmem
c
It's weird to me that changing this isn't having any apparent effect.
As a sanity-check, does
uname -a
reflect the correct build time and everything?
d
You might have a good point
Copy code
# uname -a
Linux turing 5.4.61 #2 SMP PREEMPT Sun Mar 5 22:37:36 CET 2023 armv7l GNU/Linux
but how?
What could I possibly miss?
Should I do a full reflash?
Not OTA?
c
Official firmware doesn't bundle the kernel inside the OTA updates or filesystem at all; the only way to update it is with PhoenixSuit (or you can use U-Boot and just replace the
zImage
)
d
Well, this explains a lot then
I might not have a time to set it now (I'm not at home) but definitely will tomorrow
c
👍
But yeah the official firmware's OTA updates only updating the rootfs and not the kernel is one of my big motivators to update U-Boot so we can keep the kernel inside the rootfs
d
Crap, cannot flash it. It sees the board, restarts but never continues
I may need this recovery card
Hmm, I actually probably have one
Also does not work
Hmm
I tried to interrupt it and run
efex
and this is all I see:
Copy code
=> efex
## jump to efex ...
[06.779][mmc]: MMC Device 2 not found
[06.783][mmc]: mmc 2 not find, so not exit▒[20]HELLO! BOOT0 is starting!
[23]BOOT0 commit : 2c94b33
[26]set pll start
[31]periph0 has been enabled
[35]set pll end
[36][pmu]: bus read error
[39]board init ok
[40]rtc[2] value = 0x5aa5a55a
[43]eraly jump fel
- no matter which method I'm trying
Ah, lol, had to install the driver
c
Is it in FEL mode (can you see it over USB)?
Ah
d
It's flashing now
It definitely flashed since I see same thing I saw when I've been trying multiplane
I probably did not revert everything
c
I hate that there's no clean way to turn that off
d
Still same issue. I'm doing something wrong. Might need to re-download the repo and start from scratch.
c
Did you also clear the SIMULATE_MULTIPLANE flag in the buildroot uboot directory?
Since that would need to be reenabled if so
d
Yeah, ot at least I thought so. Found a place where I missed that
Retrying (for the last time today)
Still the same issue. What the heck
Something's off. I was able to boot it with your U-boot, but the kernel is old
zImage
in
buildroot/output/images
has not been updated since March
This make me feel that when we tested disabling simulate multiplane, the test was not valid either
Their u-boot is updated, just not zImage
No, u-boot has been updated for the last time in May
I'm going to remove
images
in
buildroot/output
Looks like the whole
output
has to go
OK, so after doing this I finally see some difference.
cpufreq
and
cpuidle
appeared in the
/sys/devices/system/cpu
, but the
cpufreq
is empty. But this is something to further look at later.
cpupower
shows no compatible hardware. I also learned that the uboot and zimage do not get rebuilt even if I thought they will. Kind of... silly (for a lack of a better world), so maybe somethign to look at too. Finally, I'll look again at disabling simulate multiplane with these latest findings
m
I see this on the 1.0.2 announcement:
Copy code
Please note that the .swu package provided does not contain the kernel itself. To upgrade to 1.0.2 its recommended to flash the .img file using the phoenix suit.
Does phoenixsuit work on non Win platforms? I guess updating this page is where my question should be answered: https://help.turingpi.com/hc/en-us/articles/8686945524893-Baseboard-Management-Controller-BMC-
s
It looks like that page has not yet been updated to point to the new GitHub location. Also, I have the same question: How can the img get flashed from a non-Windows host?
c
I can't think of a better solution than, "launch Windows in a VM running under your non-Windows host and run PhoenixSuit inside that" :/
s
What's the limitation preventing the distribution of an all-in-one OTA update, as with 1.0.1 - is it a not-having-had-time thing, or is there a technical reason that it's now not possible?
(... is it possible to live-patch a running BMC to the new release in-place, manually?)
c
1.0.2 updated the kernel, which isn't built into the OTA updates (it only updates rootfs)
s
I'd hope that soon (if not already?) a BMC upgrade would consist of writing half of the NAND, and then switching banks to use the newly-written segment(s)?
c
It sorta does that now. It writes a
recovery
volume first, reboots to that, then writes the rootfs. But it only does it for rootfs, the kernel isn't kept in there.
s
Hmm - so will there ever be a post-1.0.1 upgrade which is able to be performed without Windows (in some form?) 😯
c
Most of the work @DhanOS (Daniel Kukiela) and I are doing is focused on massively simplifying the boot and firmware update process, with my goal being that full updates can be done OTA and bad-flash recovery can be done over Ethernet (DHCP), USB port (Rust-based cross-platform tool), or microSD card.
s
So is there any reasonable way to safely manually write the updated kernel to the appropriate NAND partition, or am I best skipping 1.0.2 and awaiting a future OTA update, would you recommend?
c
If you want to build 1.0.2 yourself so you have the
boot.img
you can
scp
that to your BMC (somewhere in
/tmp
preferably) then use
ubiupdatevol /dev/ubi0_4 boot.img
and
reboot
, I'm about 95% sure that will work (the other 5% is it won't boot and you'll need to break out PhoenixSuit)
s
This documentation need an update pass, which i will do shortly
using the phoenixsuit is currently the only “official” supported way. Improving the upgrade experience is on our roadmap and hopefully we can provide you with better alternatives soon. @CFSworks is doing some exciting work trying to use a vanilla uboot bootloader. But whats holding us back from just kicking swupdate from uboot? We should be able to find enough space to write a swu package. Or what about we setup a TFTP server?
*Given we add the kernel to the upgrade package
c
What's the user story you have in mind behind that last question? U-Boot is nice and powerful and we could script it to be the update solution if we want - just trying to figure out what you have in mind?
s
add the kernel to a swu package, let uboot install it, being it from a storage (sdcard/nand ) or perhaps from the interwebs?
c
The current roadmap I'm picturing is that the kernel is kept in
/boot/zImage
and the boot script (what U-Boot executes to bring up the kernel) is
/boot/boot.scr
, which solves the problem of the kernel and rootfs being separate upgrades.
I'm also wondering if you have any added thoughts on having the rootfs be an overlay of a SquashFS base (which is read-only so it remains "golden" from the time the firmware is built) with UBIFS overlayed atop.
But if so the firmware image would, conceptually, just be the SquashFS itself
U-Boot does support acting as a TFTP server though:
That's not quite the same as "take the received file and write it to storage, then boot normally" but it's pretty close
s
i agree with you having the kernel nicely in the filesystem is the direction to go in. but is not mandatory to make kernel upgrades work. as a intermediate step, we could write the kernel to the same place on flash ? we could have the webserver write a swu to flash ( maybe we can sacrifice the recovery partition to make more space?) which will be picked up by the bootloader on a successive boot Having squashfs i like and want 🙂 progress will be tracked here https://github.com/turing-machines/BMC-Firmware/issues/19
c
If we only want a .swu with the kernel present, we could probably just bundle
boot.img
in there and update the swupdate config so it writes to UBI volume #4 ("boot"); I'd recommend testing this heavily before shipping such a .swu though.
The problem with having the kernel outside of the rootfs is it circumvents the whole A/B flashing process. A bad write to the kernel volume would brick the TP2, forcing the user to use PhoenixSuit
s
I know its maybe not really an atomic update
Maybe now i think about it more its a bad workaround
You could make 2 kernel partitions, but then you might as well go all the way you suggested in the first place
c
Let me reboot back to the shipped bootloader real quick, we might be able to get it to load the kernel out of the rootfs with just env updates
Copy code
=> ubifsmount rootfs
[153.429]UBIFS error (pid: 1): cannot open "rootfs", error -22
Error reading superblock on volume 'rootfs' errno=-22!
Lovely.
s
That would have been too easy..
Is uboot compatible with the ubifs version we use ?
c
Shipped or mainline?
s
Shipped
c
I don't know about the shipped version but mainline doesn't implement
SIMULATE_MULTIPLANE
so there's a completely different view of the NAND, so it can't read it at all (without reformatting).
Which is probably fine because I'm hoping to repartition the NAND to give more space to the UBI partition anyway, so may as well take care of both at the same time heh
j
isn't that already possible as the native uboot can root ubi partitions?
c
Apparently it cannot
j
yeah seen it, answered before the end of the backlog
v
Just a heads up for future searchers, I definitely hosed my TPi2 trying to use anything other than PhoenixSuit to upgrade the firmware. Tried PhoenixCard in "Startup mode" and it got the device into a bad state which wouldn't boot. Tried LiveSuit on an Ubuntu 18.04 machine I had set up for a Jetson Nano, it would seemingly flash but then the board would report
damaged volume, update marker is set
on reboot.
w
So what do I lose from moving from CE to this latest official release?
d
Sleep.... 😇
c
Nothing here is final, just spitballing some ideas (this should be pretty easy to implement though) - would love some input on what to do with the
???
box though!
b
Beautiful @CFSworks I like it
s
Looks nice 👍 perhaps we also want to make the boot order configurable at some point. I would really love to have an server running which the board connects to and runs its recovery image from. Need to make sure we build some serious security around it though
c
More advanced users (or the web control panel) can modify the U-Boot env in order to re-script the boot process any way they desire, including the "pull from a server" approach. This flowchart is just what I have in mind for the default install.
p
Hmm - looking at discussion above - seems I'm skipping 1.0.2 update and waiting on the next one - given that I don't have a Windoze machine and don't trust a Windoze VM running on a Mac M1 (too many issues w/ Apple silicon already)!
d
I think mostly just ntp if you are using it. But even then, the ntp does not update the hardware clock. I'm going to re-base the CE to the current official firmware and re-implement the changes, test them and probably make a PR. The CE was created before the official 1.0.1 to address some most requested things, but with the development of the official repo, these features should be there and the plan for the future CE is to let people do more than this will be possible with the official repo. This still needs a bit more time, though
Nice, thank you! The first thing I noticed it'd be nice to also have the LEDs show the current state. Like if you keep a button press for less than 10s either some LED(s) should be on or blinking and the pattern could change if you keep it for more than 10s, same for the other states. But I'm guessing this is the plan.
The ??? part, for me, would be, ideally, if it connects to some external host in a secure way (HTTPS for example?) and perform the recovery without a need to download and run anything manually. This raises a few questions, for example how to show what's going on, if the firmware is downloading, flashing, etc. But I guess maybe the LEDs again?
I attempted
cpufreq
again and I think I made it working:
Copy code
# cpupower frequency-info
analyzing CPU 0:
  driver: cpufreq-dt
  CPUs which run at the same hardware frequency: 0 1
  CPUs which need to have their frequency coordinated by software: 0 1
  maximum transition latency: 244 us
  hardware limits: 480 MHz - 1.10 GHz
  available frequency steps:  480 MHz, 720 MHz, 912 MHz, 1.01 GHz, 1.10 GHz
  available cpufreq governors: conservative ondemand userspace powersave performance schedutil
  current policy: frequency should be within 480 MHz and 1.10 GHz.
                  The governor "performance" may decide which speed to use
                  within this range.
  current CPU frequency: 1.10 GHz (asserted by call to hardware)
# cpupower frequency-set -g conservative
Setting cpu: 0
Setting cpu: 1
# cpupower frequency-info
analyzing CPU 0:
  driver: cpufreq-dt
  CPUs which run at the same hardware frequency: 0 1
  CPUs which need to have their frequency coordinated by software: 0 1
  maximum transition latency: 244 us
  hardware limits: 480 MHz - 1.10 GHz
  available frequency steps:  480 MHz, 720 MHz, 912 MHz, 1.01 GHz, 1.10 GHz
  available cpufreq governors: conservative ondemand userspace powersave performance schedutil
  current policy: frequency should be within 480 MHz and 1.10 GHz.
                  The governor "conservative" may decide which speed to use
                  within this range.
  current CPU frequency: 480 MHz (asserted by call to hardware)
Tested it with: ``` time dd if=/dev/urandom of=/dev/zero bs=1024 count=65536 ```and the
convervative
scheduler. It sits at 480MHz while idle and jumps to 1.1GHz when there's a load on the CPU
This was with the stock firmware. It does not work with the new U-boot, which makes me think it's something about the dtb
This is because I derped like a noob 😉 I copied updated updated
zImage
and now everything works as expected. We also don;t have to set
mw 02001000 ca002900
- the boot process takes like a couple of seconds more (it's slower before the scheduler kicks in) but i guess settings the CPU speed in U-boot is still a good thing to do
c
I have since found the correct config option to make the SPL set a 1008MHz clock at-boot (as Allwinner's boot process does) so we don't need to be poking at
02001000
anymore at all 🎊
d
I'm attempting to disable stimulate multiplane again with a different approach, but got the same issue as before (even if I confirmed both
U-boot
and
zImage
got updated this time). But I have one more idea how to try it (that'd be the closest to how this should be done with appropriate patches) and well see, but it takes time now to compile everything from scratch and a VM ran out of disk space 😄
c
I'm starting to think that disabling simulate multiplane will be a change that coincides with upgrading the bootloader
d
Oh, and the nice thing about getting the
cpufreq
to work is that we just gained 10% of the CPU speed, since 1.1GHz is what's the official stable max
c
I'd also be curious to know what average current a PicoPSU draws at idle now.
i.e. average the current draw for 1 hour before and after this change and see how many mAh it consumes
d
I don't have a bench power supply, but I can measure voltage and current to see. But only when I get home (this weekend). What has been measured before was 4W, so I'm not expecting much less with the lower idle clock
c
Yeah, and much of that 4W is probably going to non-BMC things anyway.
With much of the community's reaction to 1.0.2 being "I don't have Windows, and the .swu doesn't upgrade the kernel, so I'm skipping this release" I'm also trying to work out a way to upgrade the bootloader from the .swu
d
Yeah, I've seen your comments
c
(We should make sure we have a recovery SD card image available first because an OTA upgrade of a bootloader is just asking for a few brickings, but there might be a way to do it pretty reliably)
d
I tested the "official" recovery SD card image and it works
I mean I used it a few times already 😄
c
Ah, the "force it in FEL mode" SD image?
d
There's also another thing - the v1.0.1 changed the partition layout and people who used PhoenixSuit with the img file have these UBI paritions bigger by 7 or 9MB (I do not recall now how much)
c
The trouble is the new bootloader and NAND layout (w/o SIMULATE_MULTIPLANE) is very likely to break compatibility with PhoenixSuit.
d
It looks more like a working OS which only job is restart into forced FEL mode
c
So a much larger version of 😉
d
Yes, probably 🙂
c
How wedded are we to adbd really -- would an ACM serial gadget be good enough for providing shell access to the BMC over USB? Is there something else we want it for?
d
I think that
adbd
is not necessary. About any way that's easy to use a USB will be enough.
adbd
is cross-platform and easy to install. I don't know about the
ACM
c
ACM is the standard USB class for emulating serial ports, most OSes have a driver for it built-in
d
Oh, so a serial port over USB. That's be a good replacement IMO
c
Today is mostly "clean up my U-Boot git tree and tend to patch review" day so my brain's kinda trying to plan ahead to keep entertained I guess 😂
d
This IS entertaining 😄
Ok, a new firmware just finished building. Fingers crossed
c
Once I get this cleaned up I think I'm ready to flash U-Boot to my NAND, and say goodbye to the old boot chain (on one of my boards, anyway)
d
Ugh, I might need to build the firmware again since I probably derped once again, LOL
No, I did not. The img file size is different
Well, ok, I don't have to flash this firmware to know it won't work. I modified the downloaded files like:
Copy code
./buildroot/dl/uboot/git/include/linux/mtd/aw-spinand.h:29:#define SIMULATE_MULTIPLANE (0)
```and removed the `output` folder to make the image fully from scratch, but when it makes it's way to the `output` folder:
./buildroot/output/build/uboot-69b04a0b3dd5c412f66e9dbfd02876eebfd99646/include/linux/mtd/aw-spinand.h:29:#define SIMULATE_MULTIPLANE (1) ```and I don't quite understand how and why
c
It might be necessary to use Buildroot's "patching" facility to apply this, since it might be re-extracting the Git snapshot with every build
d
This is my next idea
That'd be the right-est way to do this
Attempting this now
c
Oops, bricked it, need to go rewrite the SD I guess
Copy code
U-Boot SPL 2023.07-rc2-00319-gb5a03a4ae6 (Jun 05 2023 - 12:17:57 -0600)
DRAM: 128 MiB
Unknown boot source 4

resetting ...

U-Boot SPL 2023.07-rc2-00319-gb5a03a4ae6 (Jun 05 2023 - 12:17:57 -0600)
DRAM: 128 MiB
Unknown boot source 4

resetting ...
d
Haha 🙂
c
Copy code
U-Boot SPL 2023.07-rc2-00320-g95ea8c294b (Jun 05 2023 - 12:44:42 -0600)
DRAM: 128 MiB
Trying to boot from sunxi SPI


U-Boot 2023.07-rc2-00320-g95ea8c294b (Jun 05 2023 - 12:44:42 -0600) Allwinner Technology

CPU:   Allwinner R528 (SUN8I)
Model: Turing Machines Turing Pi 2 BMC
DRAM:  128 MiB
...okay nice so we can now write U-Boot straight to the NAND and it works
d
I'm figuring out how to apply this patch to U-boot 🙂
s
Moving away from the phoenix-suit is something i happily endorse 🙂 as long as we have adequately tested it, and have a recovery mechanism to go back to the old situation. Also we would need some time to bring the documentation up to date.
d
Ok, a simple hook I made worked: ``` define UBOOT_DISABLE_SIMULATE_MULTIPLANE $(SED) 's/\#define SIMULATE_MULTIPLANE (1)/#define SIMULATE_MULTIPLANE (0)/' $(@D)/include/linux/mtd/aw-spinand.h endef UBOOT_PRE_BUILD_HOOKS += UBOOT_DISABLE_SIMULATE_MULTIPLANE ```so I'm attempting to build the whole firmware now
c
My fear is that the U-Boot image loaded by PhoenixSuit just to access the NAND has SIMULATE_MULTIPLANE turned on as well (and I don't believe we have the code to it?), and will write the installed image in SIMULATE_MULTIPLANE mode so it won't be readable from a bootloader+kernel without it. 😔
d
I guess we might find that out soon
But the goal is to get rid of the PhoenixSuit anyway, right? I, however, understand that the test might fail even if the image is going to be built properly
c
Yeah, though I suspect we're going to get rid of PhoenixSuit, the old bootloader, the old partition layout, and the AWNAND driver all in one go
I just made the interesting discovery that apparently the RTL8370MB switch ASIC is strapped in EEPROM autoload mode after all (I must've missed it when I first checked for it)
This means it's possible to manage the switch by putting a suitable config in the EEPROM and then resetting the switch. Kind of weak compared to the power of DSA but it's a decent stopgap for configuring VLANs and port isolation for now 🤷‍♂️
(It's also something to keep in mind if we want the EEPROM to ourselves, though apparently the switch won't read the rest of the EEPROM if the first 2 bytes are 0xFF, 0xFF)
d
So, this time I'm pretty sure that the images built properly with the simulate multiplane off, but the issue still persists mostly because of what you mentioned. So the idea I've had was - if it starts booting, but cannot, U-boot should have the simulate multiplane off, the PhoenixSuit sees it so a consecutive flashing should work? It did not. I mean it did, but the issue persists. I also tried to find some SD image card sources to try this way, but I didn't find anything useful.
s
Its not that bad right 🤷‍♂️ how many people will change their setup regularly
d
@svenrademakers Is this something you'd like to see in the official firmware? Pretty easy to enable and work well, so I could make a PR. It steps down the CPU frequency while idle and under load sets it to about 10% higher (stable official max frequency)
I'm trying to guess what did you reply to here 😄
c
I think the file containing the deployed U-Boot is
u-boot-sun8iw20p1.bin
while the U-Boot that PhoenixSuit itself temporarily loads up during install is
u-boot.fex
and I don't think we have the source to the latter. What I don't know is for how much of the installation it has simulate_multiplane enabled. It can't be enabled for writing the boot package or boot0, since those things don't understand the simulate_multiplane layout, but it's definitely enabled for writing the
sys
partition (with the UBI images).
That's what confuses me is PhoenixSuit knows to write some of the NAND without it and the rest of the NAND with it.
s
Im sorry its quite early here 😅 just saying that having to reload EEPROM is not the end of the world, hopefully we can pick something more DSA compliant for the next iteration of board 🙂
This sticker is unintentional btw 😅
d
LMAO 😄
c
Oh, no problem with the EEPROM actually. If we're sure to keep the first 2 bytes unused, the RTL8370MB actually assumes it's empty. But it does mean that configuring the switch is possible even on today's firmware.
Still, the EEPROM + SPI-NOR connected to the switch is a bit weird. I'm noticing that the hardware designer pretty much just took Realtek's RTL8370MB demo/ref board and copy-pasted the schematic right into the TP2 😂
d
I started figuring this out some time ago too and knew it's configuration is EEPROM-based, but never made it this far to know about the first 2 bytes
c
Yeah I only found that out because I had a logic analyzer hooked up at the time it was resetting
s
Im a bit more conservative when it comes to this, especially if this is not really stress tested. But i think even with the 10% boost we are within specs?
Which one do you have? Im in the market for an actual digital one
d
This is not OC, this is from the original dtb. I only added missing pieces - drivers and a governor. I will stress-test it on 2 boards. I want to put a high synthetic load and check the temps after an hour or so. I'll be also daily-driving a firmware with this change
s
Yes, you mentioned this in your comment.. i need coffee
c
Zeroplus LAP-C, which is quite handy in a pinch (and compatible with PulseView) but if I needed this type of tool more often I would trade up
d
However this will let people OC their BMC if they wish through the
cpupower
tool
c
The datasheet for the T113-s3 doesn't appear to mention a maximum rated CPU core clock, which is a pity. It does say the DRAM is rated up to 800MHz, but nothing on the CPU.
d
c
Man what I would love is a multichannel ADC(+DAC??) that just samples at 20Msps and throws it into my laptop via USB3.0, then I can do whatever processing I want in software.
d
Correct. I've been trying to find this information too. @CFSworks found that the max stable frequency is 1.4GHz but i think it may be higher - it's not enough to just set a higher frequency since you also need to set, for example right voltage.
c
^^ And that 1.4GHz is for my particular chip. The maximum stable OC probably varies a bit from TP2 to TP2, but I would be very surprised if the top speed in the .dtb (what is it, 1.1GHz?) was too much for some.
d
What I added does not just set the frequency, it uses, what I think is the original profiles
I did not modify these
I'm pretty sure they came straight from Allwinner
c
At the very least the .dtsi is from 100ASK, who may or may not have gotten it from Allwinner.
d
I do not see how or why anyone would set this in the dtb and then never use it
s
Famous last words 🙂
c
My guess is Sven just wants to know that Allwinner is willing to stand behind the highest speed profile there, because it could be very bad for TMs if he accepted a change that (sometimes) pushed the T113 out of spec, caused instability, created lots of RMAs, and then when TMs tries to go to Allwinner like "dude your chip sucks" they go "not our fault"
d
LOL, well, ok 🙂 I'm going to test this anyway
To be honest, the gains here are negligible - power savings at idle will be more like 1-2W, maximum frequency just 10% bigger so maybe not worth it. I just feel it's the more proper of driving this CPU but I agree
Could be just a CE feature
Possibly also disabled by default (with an ability to enable it)
Just wanted to know what the feelings are
c
We could also drop that highest speed setting and merge it. I do like having the frequency scaling, if for no other reason than to minimize the wear on the T113 🤷‍♂️
s
I agree with everything you just said!
d
Now I wonder again what (or who) did you reply to and what does this mean 😛
c
(I would like to keep the highest speed setting though. Is TMs buying enough volume from Allwinner to have a rep over there to ask about the T113's max rated CPU core clock?)
d
It'd be not enough to ask just about this
They'll have to deliver us the full profiles the CPU was tested with
In theory, the clock can be set up to 3 GHz iirc? 😄
Hmm, but I guess knowing the max frequency alone will be useful anyway - not to make the profiles but to possibly disable the fastest one (if necessary)
c
The PLL is rated to generate up to 3 GHz, but the CPU core is definitely not rated to accept it 😄
s
I will ask around for that!
d
If you put enough voltage... Who knows 😛
c
I'm willing to experiment a bunch with clocks... not so much with voltage. 😅
s
Sorry! 😅 was referring to the story of @CFSworks to have the frequency scaling but not set a higher clock as highest profile
d
I guess we can also wait to see what's the highest frequency this CPU is rated for. I almost put a question in the other communication channel when you said you'll do that 😄
It's interesting why they do not specify this in the manual
I was searching for any information source about that but could not find anything either
c
Imagine if they came back with "yeah it's rated up to 1.5GHz and [some number] mV"
d
That'd be a win. And will require a lot of testing 🙂 But we know people who'll be willing to put it to the test
I knoe what people want from the BMC and this additional speed would be highly beneficial
Even of the official firmware will have the frequency scaling with the highest profile removed, I'm going to keep it in the CE, but with a warning that's above the specs, etc
I guess some sort of the OC could also be possible (like in CM4s, for example), but with even bigger and redder warning 😉
c
I'm hoping TMs takes a similar stance to the Pi Foundation on OCing: You can OC as much as you want without voiding your warranty (but you only qualify for replacement if it stops being stable at stock clocks), but if you overvolt it (beyond some soft ceiling) your warranty is immediately void.
d
The idea is that if someone wants to unlock it, will have to agree to loose the warranty, do it on own risk, etc
c
At least, I seem to remember Pi's
config.txt
has some option like that: if you set it, it raises the limit for how high you're allowed to overvolt, but it also blows an eFuse so they know the warranty is void.
d
Well, I'm not going to put any new profiles that set higher voltage. Just an ability to set the OC speed (which will set the governor into the performance mode which also means 1.1 GHz by default) or possibly just add another profile with a speed like 1.2GHz but the same other settings. Just some raw ideas for now, definitely nothing that could damage the CPU - not going to take such risk 😄
By the way, I've seen that this CPU has some internal thermal probes that can be used to return the temperatures, but also thermal-throttle the CPU if necessay
I'm curious if we can at least read them
c
I did enable the
sunxi-thermal
driver in the latest firmware:
Copy code
# ls /sys/bus/platform/drivers/sunxi-thermal/
2009400.ths  bind         uevent       unbind
# ls /sys/bus/platform/drivers/sunxi-thermal/2009400.ths/
driver                of_node               supplier:2001000.ccu
driver_override       power                 supplier:3006000.sid
modalias              subsystem             uevent
#
I admit I was expecting a
hwmon
entry so I could read the temperature. This driver isn't doing anything interesting at all.
d
You mean the official firmware?
c
Yeah, 1.0.2 has that driver enabled
AH it's a thermal_zone entry
s
Im pretty sure i saw more as you
Yep
c
Copy code
# cat /sys/class/thermal/thermal_zone0/temp 
30150
So I guess mine's at 30.150°C
d
Well, I think it's time to switch to 1.0.2 instead of playing with the old one
I mean maybe even not 1.02 but the main branch of the official repo
I need to familiarize myself with all the commits there
I want to re-base the CE with it anyway
This might mean it could work with the cpu frequency scaling for thermal-throttling. I know some people want to use TPi2 in a car, etc, so the temps can potentially be much higher there
This could be yet another benefit of the CPU frequency scaling
c
Critical temperature appears to be 115°C
And yep, looks like "critical" means "immediately cut power at this point"
btw @DhanOS (Daniel Kukiela), since you're interested in CE being a mod of the "SDCard image" edition:
I imagine CE would be that, but with only 2 partitions (boot FAT32 + root ext4) instead of 3, and the "recovery" mechanism removed.
d
Yes, this indeed is how I see the CE version after our talks. I'd love to stay in-line with the official firmware, but also make it fully bootable from the SD card. I'm still considering SquashFS, though, since it could simplify the upgrade process without loosing any data. I mean it may, but not necessarily. For example what if someone modifies a file that's a subject for update with the new firmware release? But then, the other option would be a package manager which won't be trivial to implement anyway. I'm open for suggestions
c
> what if someone modifies a file that's a subject for update with the new firmware release? The "OpenWrt way" is that only files at specific paths (e.g.
/etc/config
) are preserved across upgrades anyway. I think the overlayfs default is if the base file changes but it's been overridden in the upper layer, the upper layer still takes precedence.
d
Hmm. The CE will let people install (or just add in some way) additional things like binaries, or even add their own scripts or binaries. That might be problematic to manage using SquashFS then. So another approach to upgrade the firmware would be needed
c
Keeping a clear separation might also be good: maybe make it the rule that additional add-ons always go in
/usr/local
while
/usr
and
/
are the realm of the base image?
(
/opt
is another good candidate for "not part of base installation" packages)
d
And then maybe SquashFS will still be useful
Depending on how the upgrade process will work here
c
UBI makes it easy: make a new subvolume, load the SquashFS image in there, tell the bootloader to flip the A/B images, reboot
No idea how to do something like that on a microSD, where partition layouts are "exact"
b
do you guys flash the img file under windows (with phoenix suit) or is there a linux trick I don't know about (LiveSuit is very old...)
s
Unfortunately only windows. I havent seen anybody here reporting success with liveSuit. There is an shortcut to flash the kernel only, but no guarantees given: https://discord.com/channels/754950670175436841/1080282784570019942/1114620900524966018 you might need phoenixsuit to recover in case it goes wrong
d
It should, however, work in a VM. VMWare Player should make you able to flash it
s
Im using it in a VM currently
b
I am trying via VirtualBox (in linux) but the board does not show up
I see this device btw:
Copy code
Bus 003 Device 015: ID 18d1:0002 Google Inc. Configfs ffs gadget
via the otg micro usb connector
is "turingpi_recovery-sdcard.zip" needed? (from https://github.com/wenyi0421/turing-pi/releases/tag/v1.0.0)
I wanted to flash the kernel as well, especially because you (guys) put so much work in it
s
The .img file should contain everything. So your good there. Could you elaborate what you mean with the board doesnt show up? Can you see it in windows “Device mananger”?
b
In Linux I see the all winner usb device for about 2 seconds
I meant phoenix suite doesn't see it (I allow all usb devices in virtual box)
s
Is your board currently fully working. It boots etc?
b
yup
perfect order
just want to update the firmware
can I put it in a dfu like state via a command line maybe?
in order to flash it I mean
this usb device shows up only 2 seconds or so, when I boot the bmc
s
Try to see if you can find it in device manager. I unfortunately am away from keyboard for an hour. Will checkin later to see where you are! Sorry
b
a, thanks
got phoenix suite connected somehow in VirtualBox (the VM needed a wilcard USB allowance)
w
Oh this looks promising
b
flashed perfectly, thanks for your help guys
w
Interested to hear if anyone does this successfully
c
Since I've been bricking and debricking my board enough that it's not even an inconvenience for me anymore, I figure I should be the one to try this.
s
are you sure you want to do this while running the kernel :)?
c
The kernel isn't execute-in-place; what's stored in that UBI volume is an image that's loaded into RAM and executed by the bootloader (and the image itself isn't even the final running kernel, it's a gzipped binary + decompression stub that extracts and runs the kernel)
The only risk here is if something goes wrong during the writing process and that volume gets corrupted, the bootloader won't be able to bring the system up again. But the system won't crash and I'll have some opportunities to fix it before I reboot.
s
thanks for clarifying!
c
Copy code
turing-pi/buildroot master $ make clean
...
turing-pi/buildroot master $ git pull
...
turing-pi/buildroot master $ make BR2_EXTERNAL=../tp2bmc tp2bmc_defconfig
...
turing-pi/buildroot master $ nice make
...my gosh this takes forever when ccache isn't yet warmed up, go get coffee...
turing-pi/buildroot master $ cd output/images/
turing-pi/buildroot/output/images master $ ssh root@tp2 uname -a
Linux turing 5.4.61 #31 SMP PREEMPT Thu Oct 20 00:11:14 CST 2022 armv7l GNU/Linux
turing-pi/buildroot/output/images master $ scp boot.img root@tp2:/tmp
boot.img                                                          100% 3490KB   7.0MB/s   00:00
turing-pi/buildroot/output/images master $ ssh root@tp2 ubiupdatevol /dev/ubi0_4 /tmp/boot.img
turing-pi/buildroot/output/images master $ ssh root@tp2 reboot
turing-pi/buildroot/output/images master $ ssh root@tp2 uname -a
Linux turing 5.4.61 #1 SMP PREEMPT Tue Jun 6 12:36:32 MDT 2023 armv7l GNU/Linux
So yes, it does work (see above for the steps I took). There is still the small risk of the new kernel not booting or a power outage right in the middle of the
ubiupdatevol
, but beyond that it's a pretty good method.
(The
ubiupdatevol
took less than a second though, so a power outage during that window would be a real "the universe is out to get you today" moment. 😅)
s
i considering now to just pack the kernel inside a swu, with a big warning saying that you should only do this if you have phoenix suit at hand
i assume they have ubi handlers as well
c
Yeah, the whole thing is a ubi volume update right now actually
d
And the recovery SD card image
There's also one more thing to keep in mind. v1.0.1 changed the partition layout, more specifically UBI partition sizes, so people should at least once flash v1.0.1 or newer using PhoenixSuit, otherwise it might get to the point when they won't be able to do OTA if the file system will exceed the space of the old layout
s
Fair point.. gghh i was hoping to find a quick workaround for not having to use phoenixsuit, but there is no good way.. for now
d
I'm wondering if it'd be possible to create an SD card which contains a full Flash image, like a byte-copy of the flash (all 128MB) that could be just straight copied to the flash "blindly"
I mean bootable SD card that flashed the Flash
This is the idea I have from some time, but even after all I learned lately I don't know the answer to this question. But I guess it should be possible?
c
That is pretty much what I want to do with the recovery SD actually.
d
So this will solve the problem
c
I guess the question is, do we want a stopgap to get more people on 1.0.2, or do we want to bang the drum on a 1.1.0 that updates the bootloader, flash layout, gets the kernel in the rootfs, moves us to SquashFS + overlay, removes all of the PhoenixSuit nonsense, and provides instead some SD image that functions as an installer?
d
Both? 😄
c
We could probably just script U-Boot to do a whole-flash image actually, probably wouldn't take longer than an afternoon.
For convenience, here's my latest build, which is pretty much feature-complete.
d
What, from this, is no longer necessary?
c
(Has: correct boot clocks, can load from NAND+microSD+USB, I²C driver, USB driver, working
reset
command, working PSCI support,
fastboot
support, probably other things I'm forgetting)
Drop the two
fdt set
, the two
mw
, and the
sleep 1
. The rest is probably best to retain.
But I'm thinking we could probably use
mtd
commands to script U-Boot to rewrite the NAND from a file kept on the SD card.
So it boots up, runs the
boot.scr
(as usual), but that script just erases+rewrites the NAND to a golden image.
The fun challenge is the NAND is 128 MiB and we have slightly less than that much RAM available, so we probably want to do it in 2 steps.
w
Ahh, so I'm out of luck, it's a boot of PhoenixSuite per partition layout change, where there has already been one and another would be required to be shot of PhoenixSuite
I guess I can wait
c
I have also just realized that the
ubiupdatevol
method will only update the kernel and not the devicetree
b
If phoenix suit is the best way to update the BMC firmware right now, where are you guys downloading it from? All the download links I can find look a little... sus... 🤨
d
I'm in ned right now, but remind me in a few hours, I'll DM you a link
w
lol no kidding
this looks like it could be the official repo https://github.com/colorfulshark/PhoenixSuit
the developer seems to do a bunch of Allwinner development in a way that would be hard to fake
not that it's actually open source in the literal sense, it's just a github repo with a bunch of binaries in it
I believe the technical term is "overengineered ftp server" but what do I know 😄
b
Appreciate it - also if DhanOS has any other links - currently considering putting it in a windows VM if I don't trust it much. Can certainly wait until DhanOS wakes up - have a good night! 😴
d
Me too. At this point I have not seen anything compeling to push me to go through the pain of making v1.0.2 happen...
Am I wrong? Do I need 1.0.2?
w
The USB driver stuff seems helpful. Seems like it will be possible with some tweaking to actually mount a CM4 disk in recovery mode directly from the BMC
d
Ehh, all my CM4s are lites so I use 4x SDCards anyway. I don't think this is a reason for me to get off my ass and install a WIndows machine...
w
I set up a VM recently in a last ditch attempt to rescue a borked hard drive
d
I have no running x86 machine, would have to go dig one up. Windows VM on ARM has recently given me real headaches when I was trying to de-DRM some of my ebooks...
c
The part of me that worked on the devicetree updates and kernel adjustments is hoping for people to use it so I/we can get some good feedback, but also if I'm being honest: if you're asking that question, the answer is probably no.
(I'm also taking the resistance to installing 1.0.2 as motivation to remove those hurdles, so being vocal about not installing it is helpful too. ❤️ )
b
I was hoping that the inclusion of picocom as a package would make getting serial consoles from the CM4s easier, as I find that a bit hit-and-miss right now
d
In my view it's a balance. I am interested in playing/experimenting with my board, that is why I have it. But this seems like a much bigger pain in the ass to do than what I see as the upside for me (right now?) in 1.0.2. The balance would be more in favor by either reducing the pain or adding to the upside in terms of features. Things like VLAN support? And like I said, PhoenixSuite is a huge pain for me, I have no x86 machine easily avaiable and my experience with Windows on ARM was just not fun - I have no interest in repeating that, I'd rather play with k3s. If you have a (late-stage experimental?) method of getting a new verision of the BMC firmware onto the board that would definitely re-balance things for me...
I want to add, I do appreciate what you all are doing! I have learned a ton as I am lurking here and I can see how the system is getting better step by step. When I can help, I will - and in the mean time I am happy to be vocal about my resistance... 😇
s
Yeah, same for me. I’ll be skipping 1.0.2
And, yes! I do appreciate the work that is landing and am looking forward to exercising it when it’s a bit easier to deploy!
w
I'm not installing winbloze in order to flash the BMC. It's a linux box. I'm fascinated by the work you, @DhanOS (Daniel Kukiela) and @svenrademakers are doing, and if I had the time I'd have a play too just for the learning laughs, but in my current use scenario all the BMC has to do is bring up and shutdown nodes, and that infrequently. So long as it does that I don't need to update it. I upgraded from factory to CE using the web interface (to get a fixed MAC and IP address) and that's got nearly all the functionality I need for now. The only extra I'd like right now is the ability to change the node startup order and timing when pressing key1 after board powerup. I think I could build that myself from the existing CE source if I really needed it. I don't care about fancy things with the ETH interfaces, overclocking it, partition layouts and filesystem types, uboot, blah blah. Some people do, and it's nice for everyone to have the option of extra functionality, so keep up the good work, but don't make a winbloze machine a requirement. Fortunately I stuck a SDcard in the board before mounting it in the case (it's inaccessible even with the bottom cover removed) so if the official or CE versions need to use that in future I won't have to dismantle everything (unless it requires reformatting with a filesystem type not creatable by the installed CE firmware).
d
You probably wanted also mention @CFSworks since he put a lot of work into this 😄 As for Windows - in the tinkerer world, in my opinion, you need to be flexible. What you call a Linux box is an OS that you chose. I'm daily driving Windows, but my box is not Windows-exclusive - I'm running Linux on it too. Sometimes as WSL, sometimes side-by-side, sometimes bare-metal. I even used Hackintosh when I need. I'm not a fan of Apple, MacOS and their ecosystem, but this does not mean I won't use it when necessary. Tinkerer's world sometimes require some flexibility from you 🙂
I forgot to mention - but I fully understand you and a way of flashing from a Linux box is must have. It's inconvenient to install a VM software, then Windows in it only to do this task. I fully agree. The things will, however, change in the near future 🙂 The thing is some flexibility before this happens is highly appreciated 🙂 For example compiling the firmware requires (at least currently) Linux and I'm running an Ubuntu VM to be able to work on this.
w
I figured mention was implicit in it being a reply to @CFSworks.
d
Oh, ok. I just woke up. Definitely need more coffee 😄
w
You can always drink more coffee. Or tea. I need more hours in the day.
d
I'd love to have more day hours too. This is a constant problem for me 😄
t
d
As much as I know 1.10 works too. I DM-ed them v1.15 that I know works ok
t
The v1.10 UI doesn't match the documentation (no options to wipe). I used 1.19 successfully, and since it comes directly from an AllWinner site, I trusted it more than the sketchy download sites representing availability of 1.10.
b
Same here , 1.19 works fine
j
The 1.10 i can confirm to work. I debricked my board with that several times
The 1.15 works, too. Only tested once.
t
Ah okay. When the UI in 1.10 didn't offer the "wipe" option as documented, I didn't believe it worked.
j
The 1.10 asks you on flash if you want to full restore or just do a partial flash write. In the first option it asks if the flash should be wiped in advance (As far as I remember)
c
Sharing in case it's of use to anyone or I lose it in the Git shuffle: A U-Boot script for dumping the NAND flash's raw + OOB areas. Calling this boot.scr on an SD card with the new U-Boot on it makes it boot-and-dump. I have used this to obtain an immaculate copy of the NAND contents fresh from the factory.
d
Woah, nice, so now you have a way to dump the data with the spare cell data
Is making the opposite any sort of challenge?
c
The only potential challenge I can think of is bad block handling, but assuming a 100% intact NAND, it should be possible to restore the image back on there.
But if there's a bad block, it becomes unusable.
d
This is what I just thought about actually
In my lack of knowledge here - what happens if PhoenixSut flashes the NAND and there is a bad block?
I'm trying to think of a way of handling this and what should be done, if anything can
I guess, you cannot just ommit a block that's marked as bad
And the next question would be if we have any docs about the spare cell data layout?
c
We can look up the Macronix flash chip's documentation itself, but my understanding is these types of chips are extremely "raw." They'll report errors when a block cannot be erased, or a page cannot be read/written, but they don't handle any mapping.
Part of the reason we have UBI in there as our biggest MTD partition is because UBI does the wear-leveling, reserving spare blocks, and replacing bad blocks with reserved spares.
It's also why I'm hoping to change the partition map to be just: 1 MiB for U-Boot, 127 MiB for UBI, and the UBI partition has various (dynamically remapped) subvolumes for the U-Boot env and A/B rootfs images, etc.
Since I imagine if the "fixed location" blocks go bad, there's no good way to recover from that.
d
That'd be ideal. For now I'm trying to think how, what you made, can be used to flash the NAND without a need of using PhoenixSuit and it seems like raw writes might be a hit or miss since they'll ignore bad blocks and replace the spare cell data.
The spare cell data could be worked around with some maybe even bit masks, but then bad blocks would not be handled this way
c
Yeah - I wish it was as easy as imaging a HDD, but the need to deal with the physics of NANDs makes this not so great.
d
Well, imaging an HDD would have the same issue if there are any bad blocks
c
HDDs and SSDs at least do the block remapping on-drive, presenting only a "logical" view to the OS
d
With HDDs this assumes SMART, and SSDs indeed can do that on the fly (due to how they're handling writes to balance wearout)
But yeah, hmm
c
True, only modern HDDs do the remapping.
The way to do a "whole-flash" image without getting tripped up on bad blocks would be to image it only above the UBI layer
So instead of "write this whole .raw file to the NAND" we do "format this region of the flash as UBI and write these subvols"
d
This is what I am thinking of right now. Not that I can come with anything, just this is probably the only way
So the bad blocks would be preserved, the checksums will be calculated automagically, etc
c
It's probably not too big of a deal if we don't preserve the bad blocks, since UBI is likely to rediscover them on-the-fly. Still better to avoid erasing and recreating the UBI partition if it's not necessary though.
d
Actually, writing raw image to a different chip might be harmful since it'll rewrite wear info
From the other side wear should be pretty well balanced so this might not matter too
Or the wear data could be preserved with a lottle bit of work
So many things to consider 🙂
b
I'm now the proud owner of a TPi2 running the 1.0.2 firmware flashed using Phoenix Suit 😎 Thanks to everyone. Found the process a little weird. I ran a Windows 11 VM in VirtualBox - there was 2 USB devices I had to make available to the VM, an initial device called "Configfs ffs gadget" and another that only appeared once the flashing process had started called "V972 tablet in flashing mode". I also made sure that VirtualBox was creating a USB3 xHCI controller (instead of USB 1.1 - not sure if that was important) and installed the USB driver provided for the "Configfs" device. Once they were both available to the VM as soon as they became available - everything progressed fine.
s
I noticed the first time I booted the TP2 board that it’s complaining (in
dmesg
output) of a couple of bad NAND blocks - so, unless this enough to RMA the board, bad-block handling is going to be required 😩
d
Re-flash the board using PhoenixSuit and an img file and these should go away.
c
A random grumbling but can we get rid of
build/
in the firmware root, it keeps trapping my tabcomplete for
buildroot/
d
Hahaha 🙂 I personally like it this way, but it's not up to me 🙂
c
Here is a "live sdcard" image of 1.0.2 based off of the new U-Boot, but built from buildroot (I'm hoping to get this stuff up on my fork soon). I don't have the overlay set up so unfortunately the SSH server refuses to come up, the
/mnt/sdcard
endpoint can't be created, etc.
I'm using EROFS instead of SquashFS just to evaluate it, but we can move back to SquashFS if we decide we like it better.
t
will there be a version of the updated firmware that can be uploaded over the web?
(perhaps the PhoenixSuit method isn't as difficult as I imagine as well)
t
If you have a Windows machine (possibly VM) and a MicroUSB cable, PhoenixSuit is straightforward. Cross-platform options to replace PhoenixSuit are being evaluated, but nothing has been decided. The conversation is mostly in this thread.
t
done!
used 1.19
seems fine... didn't realize it'd reboot all my nodes in the process, should have guessed that
minor annoyance of course, but what are the requirements to give the nodeInfo tab of the web interface some functionality?
mine just says
unknown
for each node
I acknowledge this is purely cosmetic for now
if they were VM's I'd be using guest-agent software for info... but I assume it's much simpler
t
Yeah. The functionality requires the node to be powered on. I believe the BMC uses a hidden tty connection. It works for CM4s. Not certain about Jetson.
t
I have CM4s running Ubuntu Server 22.04, think I heard Pi OS gave useful output
the CM4s need UART to be on?
I like the additional software on 1.0.2
w
It shows something for Jetsons
t
Yeah. I think the code's text is "jetson", but I'd have to look again and see whether it works. All I have is an Orin NX 16GB.
b
It does work for jetson nano, shows the OS version and host name as well.
d
I'm back and home and I'm looking at this. You only used
mtd read.oob
. Did you mean to use
mtd read
for the data part and
mtd read.oob
for the spare cell data (with the corresponding
fatwrite
commands)? Then, I did a test - I flashed a card with the newest U-boot you shared here, booted the board off of it and erased flash. Then I used the recovery SD card to run
efex
and flash the v1.0.2 using PhoenixSuit. Then I again booted the board using a card with your newest U-boot and dumped the image using
mtd read
(not
.oob
since I don't need this part I believe) and
fatwite
. I then took my other board and booted off of the SD card with your u-boot, erased the NAND and reversed the rest of the steps -
fatload
and
mtd write
, but the board does not book with (what I think is) important part: ``` device nand0 , # parts = 4 #: name size offset mask_flags 0: boot0 0x00100000 0x00000000 1 1: uboot 0x00300000 0x00100000 1 2: secure_storage 0x00100000 0x00400000 1 3: sys 0x07b00000 0x00500000 0 active partition: nand0,0 - (boot0) 0x00100000 @ 0x00000000 defaults: mtdids : nand0=nand mtdparts: mtdparts=nand:1024k@0(boot0)ro,3072k@1048576(uboot)ro,1024k@4194304(secure_storage)ro,-(sys) [00.680]LCD open finish [00.737]ubi0: attaching mtd4 [00.922]ubi0: scanning is finished [00.925]ubi0 error: ubi_read_volume_table: the layout volume was not found [00.931]ubi0 error: ubi_attach_mtd_dev: failed to attach mtd4, error -22 [00.938]UBI error: cannot attach mtd4 [00.941]UBI error: cannot initialize UBI, error -22 UBI init error 22 Please check, if the correct MTD partition is used (size big enough?) ```and then there's a bunch of read errors. Any idea what am I missing?
c
mtd read.oob
seems to be "read raw data, append OOB" not "only OOB." I'm wondering if what's upsetting UBI is that the OOB is missing.
d
As far as I discovered this is what
mtd read.raw
is doing. but I may be wrong
As for the missing OOB, I guess the checksums should be calculated and updated on the fly. I'm not sure what other information from the OOB would be necessary. But I can try to get and flash both maybe
c
It doesn't look like UBI uses the OOB area after all
I guess after loading the NAND image up, try dumping it back down to verify it's the same and nothing got corrupted in the write process?
d
Yeah, this is what I think I am going to do - use the same steps (except for erasing the module, of course 😉 ) to dump the content and see if there are any differences. Just waited to hear is maybe there's something obvious I should know
d
Well, it sounds like it does not use OOB and calculates things on the fly and then keeps them in RAM
c
I'm wondering if there's a way to see the blocks marked bad by the NAND vendor, since maybe those aren't overlapping
d
The difference is:
The left side is what I read initially, the right after restoring and dumping again
The
00
on the right lasts up to
03FE07FF
I'm wondering if this is not a read/write buffer
c
So the left is the "golden" image but the right is your target device?
d
I think so. The left if what I gathered after flashing the board with PhoenixSuit and dumping the flash. The right is what I got after restoring the image from the left and dumping it again (with a failed boot in-between)
This is the only difference
c
Same board or different boards?
d
A different one. I used another board on purpouse
I never booted up the first one after using PhoenixSuit. Maybe I should to make sure the OS boots there
c
Might be a good idea, but also I'm wondering if the block on the right is bad and can't be erased.
d
It should yield any errors then.
The difference above feels like something that Ubi corrected
The flash area contains
UBI#
every
0x40000
starting at
0x500000
, except for the address from above
The first board booted just fine
So, if I transferred all 128 MB between the boards, I believe this should be enough, but obviously is not?
I also guess my goal here is obvious 🙂 I want to create a bootable card that flashes the board
c
Copy code
It is well-known that NAND chips have some amount of physical eraseblocks marked as bad by the manufacturer. During the lifetime of the NAND device, other bad blocks may appear. Nonetheless, manufacturers usually guarantee that the first few physical eraseblocks are not bad and that the total number of bad PEBs will not exceed certain number.
That first sentence makes me think that the boards have non-overlapping bad-at-manufacture-time blocks.
d
Would I be unlucky enough to have bad block(s) on the other board where the
ubi_read_volume_table
was on the board 1's flash?
I understand the UBI would handle for that and I'm assuming
Nonetheless, manufacturers usually guarantee that the first few physical eraseblocks
is for the boot area
c
There are also two copies of the volume table so it'd be weirdly unlucky to have both fall into a hole
d
I'm thinking if a way to go will be to dump the boot area as is and the UBI area using some UBI tool
U-boot has
ubi
command
ubi read
and
ubi write
maybe?
Also:
Copy code
- 0x000000000000-0x000008000000 : "spi-nand0"
          - 0x000000000000-0x000000100000 : "boot"
          - 0x000000100000-0x000008000000 : "ubi"
c
I think that's a better idea. I have a weird feeling in the pit of my stomach about the idea of releasing a card that reimages the UBI partition each time, because that's discarding such things as the per-block erase counters, and I don't want to encourage TP2 users to be resetting those each time they're frustratedly trying to diagnose something else.
You would have to use the one from Allwinner's U-Boot because my build doesn't implement
SIMULATE_MULTIPLANE
d
Yeah, this is what I was worried about with this method a few days ago. But wanted to try it anyway with just not touching OOB
Yeah, hmmm
I'll play with it later, I'm trying to finish the Orin NX flashing article series now 🙂
c
The partition tables come from the devicetree btw, they aren't stored on-flash. This is the new table I'm planning on proposing for 1.1.0, which is simpler than the Allwinner layout (2 instead of 4 partitions), and allocates more space to UBI (for better bad block tolerance as well as to maximize the space available to the user)
Everything I find says UBI doesn't touch the OOB area, and that the erase counter is kept in page 0 of each block (page 1 is the volume header, and the remaining 62 pages are user data)
d
Yeah, I've seen this somewhere during my research
I'm playing with the recovery SD card image, but the U-boot there seems to not have the SPI controller initialized. I tried setting the
mtdids
and
mtdparts
variables, but all I'm getting from
mtd
is
Device nand0 not found!
.
flinfo
also returns an empty string while the U-boot on the board (after interrupting the boot procedure) shows some info. I guess I cannot use the U-boot from the NAND image (
buildroot/output/images/u-boot-sun8iw20p1.bin
) - I tried to flash it on the SD card but the BMC does not try to boot from the SD card. I may be missing something, so any idea will be helpful. It feels like without the SD card image sources not much can be done.
c
Does
mtd info
give a differently-named device?
Allwinner's builds are kinda half-broken so it's hard to know what will and won't work
d
mtd info
returns an empty string before I set
mtdids
variable and
Device nand0 not found!
after I set it to
nand0=nand
Copy code
=> setenv mtdids nand0=nand
=> setenv mtdparts mtdparts=nand:1024k@0(boot0)ro,3072k@1048576(uboot)ro,1024k@4194304(secure_storage)ro,-(sys)
=> mtd
Device nand0 not found!
=> mtd info
Device nand0 not found!
This works if I boot the board with the official firmware and interrupt the boot process and use the U-boot loaded this way
I guess the U-boot from the recovery SD card does not have some options set or the issue is somewhere else
With the official firmware U-boot I used
env print
to dump the variables and tried to set the different ones on the recovery SD card U-boot, like:
Copy code
mtd_name=sys
mtddevname=boot0
mtddevnum=0
mtdids=nand0=nand
mtdparts=mtdparts=nand:1024k@0(boot0)ro,3072k@1048576(uboot)ro,1024k@4194304(secure_storage)ro,-(sys)
c
This is the reason for the long delay between
[    6.771020] read strings
and
[   13.608496] insmod_device_driver
while booting
It's running in the filesystem root so it's scanning literally the whole filesystem, and it will break if any other file named
usb_device
exists, and I bet changing it to
/sys/bus/platform/devices/*/usb_device
cuts those 6 seconds out of the boot
Copy code
diff --git a/tp2bmc/board/tp2bmc/overlay/etc/init.d/S11adb_server b/tp2bmc/board/tp2bmc/overlay/etc/init.d/S11adb_server
index 4473175d..d4bd9a72 100755
--- a/tp2bmc/board/tp2bmc/overlay/etc/init.d/S11adb_server
+++ b/tp2bmc/board/tp2bmc/overlay/etc/init.d/S11adb_server
@@ -64,7 +64,7 @@ function start_adb(){
     sleep 1
 
     # enable udc
-    UDC_DEVICE=`find -name "usb_device"`
+    UDC_DEVICE=/sys/bus/platform/devices/*/usb_device
     cat $UDC_DEVICE
     #cat /sys/bus/platform/drivers/otg\ manager/soc@3000000:usbc0@0/usb_device
⏬ ⏬ ⏬
Copy code
[    6.664612] read strings
[    8.731075] 
[    8.731075] insmod_device_driver
[    8.731075]
d
Hah, nice!
I've seen this delay
It was actually much much longer before we figured out the slow clock issue
And such simple fix here 🙂
c
Yeah, I suppose @svenrademakers can quickly commit it if he likes. I don't think it's worth a full PR, especially if I'm about to experiment with using USB CDC-ACM instead of adbd
d
My bet would be on PR, actually. Let's see 😄
By chance, you were right @CFSworks about the max CPU speed - it's indeed 1.2 GHz 🙂
We have a full OPP table for the CPU speeds
We got a dtsi file which contains some changes and I found that wenyi (who prepared the original firmware) modified it for some reason
Will the updated dtsi file be of any use for you?
@svenrademakers what do you think about the CPU speeds? They're 480MHz, 720MHz, 912MHz, 1GHz, 1.1GHz and 1.2GHz, all with all of the settings for stable clocks. I'm going to update 1.0.2 with these values, apply
cpu_freq
, and run it on one of my boards
s
I think its oke, 2 things comes to mind: * i would like to have some stress proof. E.g. let a board run full load for a couple of days * im going to verify with wenyi why he decided to under clock. I will run this test and everything, but it would be nice if some people are also running already with this change. Fyi, im not available this week. I can start looking at this next week 🙂
d
I can stress-test it. I don't think he decided to underclock. I feel more like he just set some frequency. He did not even set
cpu_freq
. Also the
dtsi
file he has is older from what we just got. The
dtsi
file contained with the firmware does not even contain the 1.2GHz entry I guess we can find some people who'll be willing to test this 🙂
If I had to guess he chose the safe value without playing with teh voltages. The voltages are a part of the OOP table, but he does seem to be setting any of them.
s
Gotcha, thanks for sorting this out! Much appreciated
c
Before I forget: could someone (or me) check if the Macronix NAND does hardware ECC? If not, we should probably enable ECC in software (to make use of the OOB area), as that's generally pretty easy to do.
s
according the dataheet; it does 4bit ECC
c
So this week I learned a lot about NAND and bad blocks. Apparently it's not considered a manufacturing defect unless more than 20/1024 blocks go bad during the normal erase lifetime of the chip. The firmware is therefore supposed to reserve a pool of 20 good blocks and swap them out for the premature bad ones. It sorta does this already, but it reserves 10 pairs, not 20 independent blocks, so 11 random failures is still enough to overwhelm it. 😔
b
Wow!
s
Interesting, thanks! Is the reservation of 10 pairs rather than 20 blocks something which could be resolved in a future firmware update/NAND partition remapping, or is that decision now fixed and immutable?
t
If I remember correctly from my one visit to a NAND packaging factory in China, bad NAND blocks are fully discharged during profiling of die on the wafer. It's part of the die binning process prior to packaging.
c
It's definitely not immutable and I spent last week writing some Rust code that can reformat the NAND to the right layout. It works but it also erases the UBI layout volume and now I have to write code to restore that. 😅
Apparently the Macronix datasheet for the TP2's NAND says they ship from the factory with good blocks fully erased (all-FF) and bad blocks detected at the factory will have 00 as the first byte in the spare area of pages 0 and 1 of any bad block, and on first boot (i.e. at the TP2 factory) the user of the NAND is supposed to scan for this and populate a bad block table with it.
Cool that you visited a factory in China though - that's something I might want to do (well, maybe in Taiwan given the state of things in China heh)
t
Yeah. This was outside Shanghai. That factory may be PRC-only now. The company built another factory in Malaysia to supply other markets. Most Singapore electronics manufacturing operations moved to Malaysia 20-25 years ago.
d
I just modified, compiled and flashed 1.0.2 with the
cpu_freq
enabled and updated
dtsi
, but the max reported frequency is 1.1GHz. From the oop table, last 2 entries: ```json opp@1104000000 { opp-hz = /bits/ 64 ; clock-latency-ns = ; /* 8 32k periods */ opp-microvolt-a0 = ; opp-microvolt-a1 = ; opp-microvolt-b0 = ; opp-supported-hw = ; }; opp@1200000000 { opp-hz = /bits/ 64 ; clock-latency-ns = ; /* 8 32k periods */ opp-microvolt-a0 = ; opp-microvolt-a1 = ; opp-supported-hw = ; }; ```1.2 GHz has
opp-supported-hw = <0x1>
, so I guess the hardware version of our chip is different. Do you have any idea on how to read the hardware version? I can search too, but maybe you know
c
Not offhand. I'd probably start by reading the source code to see what it does with the
opp-supported-hw
property.
d
This is what I did, but I did not find the right spot yet. I guess 1.1GHz is a limit for this chip anyway so this is what we use. I'll stress-test it and then share image for everyone wanting to try it. Then I'll do a PR
c
I'm hoping today or tomorrow I get a branch pushed up on my fork that builds a bootable SD image, as well.
d
But we are talking with the simulate multiplane off?
c
Probably on for now (so that people can mount their UBIFS root if they want) but the goal is off. I have a mostly-written Rust tool that is showing some promise in being able to unscramble the egg (preserving UBI EC headers!) and migrate away from this accursed flag once and for all.
d
I was going to make the SD card image based on the current 1.0.2. Does this mean I'd only loose time? I'm still talking about the flashing SD card for the current firmware
c
OH okay we're talking about different things
I'm talking about
d
Yeah, my idea was to make such image only to have the U-boot in a version capable to access the flash. I wanted to remove all the partitions and create just a FAT32 one to contain the images dumped from my working and freshly flashed board. I attempted to use the recovery SD card, but it does not contain the components to access the flash
So, are we really talking about the different things here? If you make a bootable SD card, I guess The U-boot should be able to access the flash and this is all I'm after
c
The version of U-Boot I'd be shipping in that image won't do SIMULATE_MULTIPLANE, but the kernel (for the time being) would.
s
did you try out xfel? its not going to fix writing flash in uboot, but it might be handy fo you as well. i've been playing with it for the past coulpe of days and it works great. was able to read/write the mtd partitions. loading a program into ram also doesnt seem to be any issue:
Copy code
xfel ddr t113-s3
xfel write 0x43000000 buildroot/output/images/u-boot-sun8iw20p1.bin
xfel exec 0x43000000
https://github.com/xboot/xfel
im working on obsoleting the phoenix suit, since what you guys are working on has a big overlap i want to pick your brain a bit, as i see it i want to have 2 (3?) ways to update the firmware. 1. The default way, via the swudate agent, it should be able to download itself the latest firmware and install it. ( we need to udate the uboot env a bit so it can A/B rootfs and kernel partitions in case updating went wrong) 2. When the user bricked the device, it can use the "imager tool", it will put the board in fel mode, and writes a complete nand image to it. There are 2 ways: a. the easy way with using the xfel write to write the whole flash. b. Write a small Fel tool that loads from RAM so you could use the propietary driver to write certain partitions etc while being able to perverse badbloks and write counters) 3. SD recovery image? As a first step im looking into option 2a. Writing the magic word to the correct register to put the device into fel, and writing binaries to NAND i can do. Now the question is how to forge the raw nand image, because ofcourse the livesuit/phoenix suit image is all proprietary stuff. Yesterday i just created a ubi image for the /sys partition and tried to write it, but i run into ubi EC header issues, as @CFSworks is seeing as well. As i understand this SIMULATE_MULTIPLANE is just concatenating the data of 2 eraseblocks. That would mean that the ubi block headers should still be there? anyway i have the raw binary data of a correct ubi volume so we should be able to quickly figure out what exactly the driver is expecting as headers. There are still many questions left which i hopefully can anwser soon: - Does the phoenix suit write any integrity checks, crcs? - what about gpt/mbr - Do we need all this partitions in 'sys_partitions.bin' - Can we just turn off simulate multiplane, what are the consequences
d
I didn't, I'll take alook. The biggest issue for now is booting off of an SD card in a way that also contains flash module drivers loaded.
2a. I literally tried that here: https://discord.com/channels/754950670175436841/1080282784570019942/1117798487329878046 and it did not work. It looks like you have to write the U-boot directly, but for the UBI part you need to use UBI. And this is where I got stuck for a little bit - the stuff needed to do this properly does not exist in teh recovery SD card image that I wanted to use as a base. This is why I'm going to try to make a bootable SD card off of 1.0.2 and use this as a base (this is basically 3. in your list)
I guess you find some useful information above, but what I tried to do was to flash the board with 1.0.2 and dump the NAND onto the SD card, then restore the NAND on another board with this image. But it does not work this way.
s
did you copy all partitions?
d
Whole 128MB of the NAND
One way or another, I believe you need the ubi tools to be working to flash ubi - no matter if this is a flasher or flashing (recovery) SD card
s
im not sure if this mtd command you use tries to adhere to some ecc sheme. i need to read a bit about this
did you try to read, erase and dump back the data to the same board as well? just for sanity reasons
im able to read write back without any issues using the spinand command of xfel
d
I could also copy the oob area, but this is not the way
I did - I erased one board and flashed using PhoenixSuit then dumped whole NAND. On the another baord I erased whole flash and restored the dump
Even if this works like expected, it's not the right way for some other reason - after some time, some bad eraseblocks are going to pop in. It is even ok if the factory-new flash contains some bad blocks (but only on a later part of the flash, the 1st stage bootloader area must be free of bad eraseblocks). So, UBI handles for bad eraseblocks on the fly, but this also means that an image dumped from one flash might contain some bad blocks and restored on anotehr blocks might write over bad eraseblocks. This is why you really need to use ubi tools to dump and write ubi partitions
UBI scans the volume when it loads up, reads, for example, which blocks are bad and keeps this in RAM and remaps the blocks on the fly. But this means you cannot assume same image restored (using, for example, PhoenixSuit) on 2 boards will yield same binary image
So, my current thinking is you can flash U-boot using
mtd
and UBI partitions using
ubi
, and I would have this working already if only the damn flash was initialized with the recovery SD card image 😄
s
what do you mean with reading the oob area? this should reside before each page of an erase block?
d
OOB (out of band) data is a spare cell data stored with the cell array:
It holds information like bad blocks, ECC, etc
You can read it using
mtd.oob
s
i was assuming this mtd commands are just reading raw erase blocks, but it seems it doesnt? im going to check first what it exactly is doing before im bombarding you with any more questions :p
d
There is
mtd.raw
that reads both cell array and spare cell array. Otherwise
mtd
reads only cell array and
mtd.oob
only spare cell array
Do not hesitate to ask if you have any questions. I don't know this much, but I'll be happy to share what I know
Also, usually you do not want to touch the spare cell array (oob) - you do not want to overwrite the data there ike bad blocks and the CRC is going to be calculated and written on the fly (by the driver)
s
i understand, but if mtd.raw works and normal mtd not we know for sure there is an issue with the ECC config
d
I didn't try
mtd.raw
, I don't want to overwrite the oob since this can mean some troubles
I was, however, considering writing some tool that'd extract and copy over only the ECC data
But it should be calculated on the fly, so maybe?
Either way,
mtd
(or even
xfel
) is not a way to copy/flash the UBI area
But indeed the ECC would need to be working during the flashing process
s
there is a script btw in the sunxi-tools repo that calculates ECC values as part of a image creating script, its called
nand-image-builder
d
Maybe worth a peek, but this should work on the U-boot level. Since CFWorks asked about the ECC I guess it can be "just" enabled and should just work
I mean, recovery/flashing might be using a working buildroot Linux to do the flashing process but it'd had to be separate from the flash so flashing by downloading the image would not be a thing really unless you could boot off of RAM partition. I don't think that'd be the right way. I guess fleshing on the U-boot level is what'd be the most flexible thing.
s
it could be useful to A/B some stuff with. im not completely agreeing there, but at this point i want to be able to successfully generate a ubi partition and write it. The next step is to see what is the best implementation and on what level. i will take a bit more of a dive into the driver, the only source of truth, see if it matches with what ive generated. i will come back with some questions soon 🙂 thanks
d
May I ask what you are not agreeing with? Like the part about UBI? The fact here is the flash chips on different boards will contain bad blocks and will contain them at different locations (And UBI is handling for this) and writing raw NAND will simply write over bad blocks too. This is why writing UBI area with
ubi
tool would be the right way of doing this.
Like at this stage I'm 100% sure writing raw NAND is not the way to do this
One more thing - I've seen TPi2 boards with bad erase blocks already and helped them recover the boards. The OS was not booting, so the PhoenixTool did not start.
efex
was usually a way and re-flashing so UBI can handle the bad blocks and swap blocks which makes teh OS runnig again
Oh, right, I forgot to mention that UBI reserves some blocks for swapping iirc. Anyway, it can re-map the blocks on the fly which means 2 boards flashed with PhoenixSuit does not have to have the exact same binary content in the NAND because of that
s
im not arguing the importance of bad PEB handling, ECC or preserving OOB area's for that matter. I was suggesting to narrow down the problem by trying out a more simple construction like writing raw NAND. Wether its good enough for production environment.. probably not. I dont agree with uboot being the component that does the heavy lifting of interfacing with the mtd device. (in the context of firmware upgrades). I still think swupdate is a proven update agent, which we need to center our OTA process around. Yes in the current state there are a lot of ways to "brick" your device and its not good enough yet. But i think we can get everything in a place where we can have atomic updates for all parts of the firmware. im sorry if i misunderstood you, there is a lot of terminology in your text without any context
d
Yeah, my first goal was to do it with the "raw NAND" write approach just as a PoC. I also wanted to keep the card size as small as possible, thus I though using U-boot would be the best approach. I kind of forgot about OTA, though, so yes, you are right here. As for the terminology - if something's not clear, fell free to ask, I don't know which ones are foreign to you.
In the end, the goal is the same, no matter the method 🙂
c
For 1 I'm trying to simplify a bit, wondering if instead of using a full .swu file, our OTA updates should be just the rootfs (EROFS/SquashFS) image. The advantage here is U-Boot in "debricking" mode knows what to do with the file: just write it to a UBI volume and reset. (We sort of do want to break compatibility with 1.0.x's OTA mechanism anyway.) For 2, I've been using sunxi-tools's sunxi-fel to load my (mainline) U-Boot and it works great. I haven't yet experimented with it, but the DFU mode might be able to expose the raw NAND to an imaging tool that we can then have migrate the flash layout. For 3 (which is my main focus for the time being), I'm working on a tool that can do the full installation.
I have a 3/4-written flashing tool that handles migrating the SIMULATE_MULTIPLANE EC header layout to a flat layout. It doesn't yet handle writing the UBI layout volume, and since finding out that's necessary, I might as well add the ability to write the rootfs volume too. That would make it a pretty complete tool. I've also kept the NAND access abstracted away behind some traits, so we might be able to make it an over-USB installer without much added effort.
s
We talked about having a squashFS, i assume you also will have the kernel in this filesystem? couple of questions; - in this design, how is uboot getting the rootfs to write? - how would one update the bootloader itself, secure_storage and boot0 partitions? - did your solution also account for the possibility to change mtd partitions? - if read correctly, you will use uboot to "debrick" the device by writing a new rootfs for example. How would one recovery from a broken uboot. Is this where you intended the USB installer for?
c
1. If in debricking mode, I guess it'd be provided via tftp. I haven't really given it much attention yet. 2. The new layout would do away with the secure_storage and boot0 partitions: one for U-Boot, one for UBI. U-Boot wouldn't be updated OTA (it's too risky) but only when the full recovery installer is used. 3. Yes, new split is 1/127 MiB (vs. the current 1/3/1/123) 4. A broken U-Boot would require either the USB or SD recovery methods. But we won't be updating U-Boot often so it shouldn't spontaneously break.
d
2. Could not U-boot upgrade be an option? For example, some FW versions might require a new U-boot, or just make it a separate swu? I guess it'd be easier for many people to OTA update the U-boot instead of using the other ways (which they'd have to use if the OTA update failed and the board does not boot)
c
It could be, if we really wanted it. The SPL can load U-Boot from a UBI volume, which would be a safer way to implement this, but it would slow down the boot process since UBI requires a full scan to load.
I feel like we should probably focus on not needing to update U-Boot beyond the one time though.
d
I meant more flashing U-boot to its desired Flash area than loading it from UBI. I only see this as a more convenient way of updating it. But I agree the need to update the U-boot would be rare, but the question is how many times we can ask a user to use SD cards or cables to upgrade the firmware. I kind of saw this as... once.
c
We could include a separate option that rewrites the U-Boot partition from the running firmware easily enough. It's really more that we should write to that area very sparingly and make sure the user is well-equipped to recover from it going wrong.
s
I would like to have this after we have this whole mechanism stabilized. One solution i can think of is just to have 2 bootloaders on flash which is not unheard of. You always flash on of the 2 so you can always fall back on the other one
Basically what you said @CFSworks 😂
d
I'm not sure you can have 2, can you?
c
The 2-bootloader approach isn't a bad idea but we'll need to figure out how to make sure that the failover works when we need it to.
Ultimately I think we can only have 1 SPL but we could always patch SPL so that it first tries to load from offset X and then Y if that fails.
d
Yeah, this is what I mean
But then you give it an address and it starts executing the assembly, you don't know the content and neither if it's correct or not
c
I do know the U-Boot monitor (what the SPL loads) is a uImage, and that includes either a CRC or checksum, so the SPL has some way to detect load errors. But of course that won't help if the problem is in the binary itself (e.g. programming mistake).
d
Or flashing stopped in the middle (maybe?)
Or would you CRC-check some part of the flash?
Like 1MB or something like that?
c
I don't see it as a likely issue where the bootloader spontaneously fades away from NAND, so the only real issue is this imo
d
I mean some bad erase blocks might also appear, but I guess nothing will help then anyway 🙂
c
I'm hoping the SPL can be made to skip bad eraseblocks, so that at least the bootloader can be reflashed to recover from that situation.
If 20/1024 EBs are bad, and we assume a random distribution of them, then this means there's a 15% chance that one of the blocks in the boot partition is bad.
d
Isn't the bootloader meant to be a part of the root filesystem?
I mean that'd be the 2nd stage bootloader
I mean the 1st stage, the U-boot
Unless I'm missing something (which is highly possible)
c
BROM (built into T113) -> SPL (first block of NAND) -> U-Boot (subsequent ~5 blocks of NAND, at most 8) -> U-Boot env (UBI volume) -> boot script (in rootfs, updated/provided with firmware) -> kernel (also in rootfs)
d
OK, this makes more sense now. Thank you
c
I'm hoping we can manage 99% of the U-Boot behavior changes we need in the future with just env and boot script updates
d
Yeah
c
But there may come a time when we need to build some new feature into U-Boot, so it might be good to have an idea of how to upgrade it again.
d
Now I see how indeed there'll be not much need to update the U-boot
s
This 🙂 and we always need it on the most inconvenient time 😂
c
I suppose we can have a "minimum bootloader version" that's checked before any OTA update is installed, and if it's not met, we bug the user to go read the documentation. The documentation then says "use either the SD or USB update methods if possible; if you want to live dangerously, SSH in and dd this file over /dev/mtd0, then reboot, but be ready to use the SD or USB if it goes wrong"
d
This is kind of what I thought, just a different way of flashing
s
This would be acceptable. Most users wont bother updating. The ones who do, usually have knowledge enough to work the commandline
c
I'm also not foreseeing a mandatory U-Boot update unless we, like, completely change the rootfs type in the future.
(We will probably still be updating our U-Boot but it will mostly be quality of life improvements.)
@svenrademakers Also a few emails in my discussion with U-Boot's sunxi maintainer you might want to weigh in on: https://lore.kernel.org/u-boot/20221206004549.29015-1-andre.przywara@arm.com/T/#m74cd78162f75895964f8f1fba744a01385d08d7c
The TL;DR seems to be "upstream Linux would really love a devicetree for the TP2 sooner rather than later" but I'm specifically trying not to touch the mainline kernel until I land these U-Boot changes, because I just know that's going to be a whole separate rabbit hole.
s
I understand. Do you have any clue already whats waiting for us with regards to mainline linux. Porting spinand drivers i suppose?
c
I'm expecting, once we get off of SIMULATE_MULTIPLANE, we can just drop in the upstream spinand driver and it will work out of the box.
It's more that I haven't scoped it out, though. It might take an afternoon, it might take a week, it might take a month.
But it seems like upstream Linux already supports T113 far better than does U-Boot so maybe it will be on the easier side.
s
I guess we will find out soon 😅
To circle back, what issue do you have regarding ubi volume table? And is any of your work public?
c
I'm wanting to put it up on a WIP (as in "please don't pull I want to rewrite the commit history") branch on my fork soon. But the issue is just that I'm not yet writing one. I take care of creating all of the EC headers already, but Linux rejects it because it's still missing some info about what volumes exist. So I need code to write UBI volumes too.
Also before I forget, two things relating to the EEPROM: 1) Remember that the EEPROM has to have FFFF as the first 2 bytes, to stop the switch from loading it as a config. 2) If I'm right and the TP2 incorporates the RTL8370MB demo board schematic verbatim, the EEPROM autoload should be configured by a pull-up/-down resistor, depending on which is populated. If so, it's very easy to change that in the factory without having to spin a new PCB, so that should probably be done today. (And then that frees up the first 2 bytes for, perhaps, a board revision ID, since only the first revision needs FFFF there)
As well the Winbond SPI-NOR out past node 4 probably doesn't need to be there, since I suspect that's also a holdover from the RTL reference design.
Okay I have a branch up. It's a WIP, and I reserve the right to rebase and forcepush and all other kinds of ugliness, but it's a start. Here's an SD card image that can live-boot (it does not persist filesystem changes however):
Things which are WIP: 1) The
uboot.env
needs more fleshing out; I'll probably be focused on this for a bit (both the boot logic and recovery logic) 2) The
mount_overlay
script needs to be implemented (so that it can do a normal, non-safemode boot) 3) There's still no
installer
binary, so the "installer SD card" functionality doesn't do anything (I still need to push the Rust code for this) 4) The Ethernet switch management needs testing: confirm that the RJ45 jacks are isolated when in the bootloader and bridged when Linux is up 5) The uboot build config is definitely not final, that should be done according to what features we want
s
I would love to push for us to run mainline linux, but its currently 3th/4th on my list. 1. get the userspace daemon off the ground 2. improve firmware update experience 3. mainline/RTL8370MB Let me know if i can take something off your hands in order to get some DT upstream.. this only be interesting once we cut loose from the proprietary nand driver.
c
That's pretty much my roadmap too. I want to land the U-Boot stuff (which is #2 on that list, in my mind) and have that under control before I crack open Linux code.
But yeah, as much as I want to help out upstream sooner rather than later, I don't think it makes sense to be providing them with a DT that we ourselves aren't (yet) relying upon.
s
probably will create more noise and confusion
c
The more I think about it, the more it seems like the right play is to wait until we're on mainline, get the basic DT upstreamed, but fork it downstream and add the other features (mostly the Ethernet switching stuff) and cut a few releases from the forked DT... then once a few months go by without us needing to update the DT anymore, we upstream our changes and try to rely exclusively on mainline.
Luckily DTs don't need to change that much, once they're "final."
t
@CFSworks hopefully clear enough
c
Very clear thanks! (And that explains why I can't do adb over USB/IP - guess that's another minor driver fix that's needed)
I spent longer than I expected debugging this mechanism, and I'm a little surprised U-Boot doesn't (apparently?) have this as a built-in feature, but nonetheless I'm excited about it for its upgrade usefulness:
The idea is a new version of the firmware can be written to a different UBI subvolume that isn't in the normal bootpath (e.g.
rootfs_new
), and the
nextboot
mechanism can be used to schedule it as the next booted firmware. Then, once(/if) it boots successfully, it renames itself to
rootfs
. As a result, the worst a horribly-timed power outage can do is rollback the upgrade.
s
I was actually thinking about an idea which has a bit of overlap with this. If we would have a bootcounter in the uboot env that gets increased by uboot on every reset. The kernel clears this when it successfully boots. uboot can then decide that if it sees that it already resetted e.g. 5 times to start the other ubipartition. Not sure how often we would end up in this situation, especially if we would have a readonly file system recovery mechanism
c
U-Boot does have first class support for bootcounters and it should be pretty easy to reverse the primary/secondary boot order for 5+ unsuccessful boots. It also supports putting this in the EEPROM, which might be preferred as to avoid erasing a block of NAND on every boot.
Hey @svenrademakers -- looking at your commit just now, is it actually the hardware designer's recommendation to sleep 1 second after each power-on? The DT should be updated if so (and the kernel will enforce the delay by blocking the write)
s
1 sec is a bit much, probably couple 100ms is enough, dont think there is a precise requirement. at least the goal here is to smooth out powering nodes in sequence
I like your changes a lot; the code will be much more straight forward 👏🏻
c
Haha I'm still trying to figure out how to do the same thing with the USB bus muxing 😅
But for more simplicity's sake: the "time for a given power supply to start delivering reliable power" is configured with
startup-delay-us = <...>;
in the DT, and Linux blocks
"enabled"
writes until that delay is up (so the
tokio::fs::write(sys_path, node_value).await?;
actually awaits until power is stable already). So the
sleep(...)
is only necessary if the smoothing delay is greater than the "stable power" delay.
s
the idea was here that if multiple nodes are requested to be turned on, that they are powered in sequence with some time between them to not stress the ATX too much. At least this was my understanding of your message in your PR as a side effect; if only one node is turned on you will wait unnecessarily.. might fix that
c
Ah, my recommendation was just not to write to 2 different regulators in parallel.
s
gotcha
will update it then, can i put you as a reviewer on the PR when i open it?
c
Sure, though I'll mostly be reviewing it for behavior not really style (I'm a newcomer to idiomatic Rust)
Trying to understand what could be causing
s
sorry missed your message, i updated the PR with a comment, let me know if you need anything
c
On the powering-off bug, I did notice that closing the gpiochip dev caused all outputs to be released. Is the BMC daemon holding the GPIO file open?
s
yes its holding a fd open for the duration of the program. im not really over the moon of this gpiod crate, wouldnt be surprised if maybe there is a bug here
im going to check a bit the ioctl calls to see if i can find some anomalies.. maybe this PR might be the problem
c
Sounds like strace is the right call then 👍
We could save a lot of filesystem space by disabling the ALSA and LVM2 packages.
Yet more if we can disable eudev (not sure if anything depends on this though), but if not we should disable
BR2_PACKAGE_EUDEV_ENABLE_HWDB
We can probably also disable
keyutils
Before disabling those 4 things:
Copy code
35M output/images/rootfs.erofs
60M output/images/rootfs.tar
24M output/images/rootfs.tar.gz
After disabling those 4 things:
Copy code
27M output/images/rootfs.erofs
43M output/images/rootfs.tar
19M output/images/rootfs.tar.gz
These are my exact config changes, if we want them upstream:
And here's a build I'm quite happy about:
1.0.2 running fully from a microSD; the firmware part of it should be feature-complete (aside from not being able to do OTA update)
I haven't implemented the "recovery" logic either, it just goes to Android fastboot if you long-press key1 on startup
j
If the swupdate packge is not used you could save somewhat more space. if the webcomponent if it is not enabled, already saves some space
s
we spotted the ALSA package as well and is disabled in master
c
I'm down for cutting out swupdate, especially since the immutable rootfs makes updating really pretty simple (the update image can simply be the rootfs). I'd probably still want to look at whether swupdate can still be of help in some way before disabling it, but the update process without it is probably 20-30 lines worth of shell script.
j
exactly if kernel, modules + userland are in the rootfs, you simply spoken do some magic with uboot envs to see if a successful flash does properly start on the next boot. And even if the kernel is not part of it because still seperate it will just work.
c
Yeah - the only thing I'm wondering is if swupdate does that magic for me, or if it's shell script time. And actually I sorta like the idea of the update system mounting the new rootfs image as a loop dev and then running a script inside the update, that way we don't have to be super careful to remain compatible with the previous version's update system hehe
5 Views