Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Random Tool Change (2 into 1 add-on) #16

Open
HippoDan opened this issue Oct 26, 2017 · 55 comments
Open

Random Tool Change (2 into 1 add-on) #16

HippoDan opened this issue Oct 26, 2017 · 55 comments

Comments

@HippoDan
Copy link

HippoDan commented Oct 26, 2017

Printing from Octoprint frequently finds an extruder switch at random, uncommanded. The forum also reports this problem on any USB connection, not just Octoprint. I'm currently testing this from the SD card, which does not seem to have the same problem.

Firmware downloaded and flashed 10/25/17 using Arduino IDE 1.6.1.

@teejaydub
Copy link

It seems like printing from the SD card is a decent workaround for that problem if printing from the SD card fits with your workflow. It doesn't for me, so, working on Ryan's assessment that the problem is caused by too much RAM being used by the second extruder, I tried disabling SD card support instead:

teejaydub@cea255d

(of course you also need to set NUM_EXTRUDER to 2, but I wanted to leave my repo ready for a pull request)

That frees up quite a bit of RAM - it goes down from 89% usage with one extruder + SD card support to 81% with two extruders + no SD card support.

Time will tell if this fixes the random extruder switches, but I got through one print OK and will report back as I get more experience with it.

@louvanna
Copy link

Using cea255d firmware on RostockMax V3 -dual, and have the same RAM %'s using IDE Arduino 1.8.4. I am 0 for 2 attempts where the controller will stall early into a print and eventually go into reset. No joy.
There is one other comment elsewhere about memory overflow "Array memory overrun fix and ui_lang.h inclusion fix. #14" , but have yet to incorporate that yet.

@teejaydub
Copy link

Hm, stalling and resetting could be a different problem? I haven't seen that at all.

@louvanna
Copy link

louvanna commented Jan 19, 2018

I am using OctoPrint as the USB controller. Clarification, the LCD initially shows Extruder0 , as intended. The job is only T0, so the random extruder change is still present. The delta stalls after the LCD changes to Extruder1, but the extruder stepper (1) is now active. So really the delta has stalled, but the firmware is still executing something.

@dbfrompw
Copy link

On the Rostock V2, if set to 2 extruders, if you change to 2nd extruder, it locks up and needs rebooting. V91 works but cant get inductive probe to work.

@louvanna
Copy link

A rollback to v2 / v91, would mean forgoing the HE280 accelerometer, nope.
Update, it works somewhat.
After deep thought, in the fog the 2017/2018 seasonal flu, I forgot to erase the EEProm prior to the new upload, and recalib.
Remedying this omission, I can now print for an hour+ before a random tool change away from T0. Too bad my print is 7 hrs. While it printed with an empty T1, I manually simply moved my filament spool from T0 to T1. Many layers and hours later, the printer was found cold/idle, with its head buried at the print surface, and the LCD display line, 'Print Killed'.

Machine: dual 'Y" extruder kit, Rostock Max v3, firmware cea255d plus pull #14,
Controller: OctoPrint 1.3.6 as the USB controller on RPI.

@EliW
Copy link

EliW commented Feb 21, 2019

Chiming in here: There is an identical/similar bug being discussed over on the OctoPrint Github (because I filed it a while back not knowing about this one):
OctoPrint/OctoPrint#3031

Basically I'm having this exact same issue. However, with a couple twists:

  • I'd been printing fine for months with a dual extruder over USB, when connected to a desktop and S3D. I only started to have this random extruder-swap when moving to using OctoPi
  • I've recently had a different fault while debugging the extruder swap. Where some garbled communication caused an issue 15hrs into a 16hr print (after I'd originally found a workaround of 'print only on T1' which got me past the 1hr point. But it seems, only to a 15hr point.

@HippoDan
Copy link
Author

HippoDan commented Feb 21, 2019

I found s solution for this a while back. Cut the data rate in half. Simply reducing the baud rate in octoprint to 115000 or whatever it is, made the problem go away entirely. It would take a lot longer to transfer files to the sd card through octoprint, but I never do that, and print speed is unaffected.

I have successfully run a 38 hour print with that configuration.

@EliW
Copy link

EliW commented Feb 26, 2019

So: New problems from this weekend (and this morning)

I've been printing 24/7 in preparation for an event. And twice now I've had extruder changes while printing from an SD card. The latest one this morning.

So now I'm completely flummoxed as to what to do. I did still have Octoprint connected via USB. But just hanging out monitoring. So if just 'being connected' is enough to trigger the bug ... OR ... It has nothing to do with Octoprint at all.

I'm seriously considering just reflashing my firmware to single extrusion and considering the whole dual extrusion addon to be complete junk because of this :(

And I was really looking forward to making more dual extrusion awesomeness....

@teejaydub
Copy link

I haven't been looking into this much, because I have had persistent stringing problems, making dual extrusion spectacularly crappy even without random tool changes. BUT I just wanted to put this out there: it smells a lot like a buffer overflow I recently found in a totally different project. I would like to get some time to investigate what happens when the firmware runs out of RAM for incoming data - in particular, does something get overwritten, by a class C array overrun or some other way?

Just in case someone else has the time to pursue that angle before I do.

I'm not sure how that would explain the issue when printing from SD.

@EliW
Copy link

EliW commented Feb 27, 2019

I dunno, buffer flow sounds like a legit issue here (and why I prefer coding in higher level languages now-a-days) ... Even on SD card, just perhaps for different reasons.

If you look in the other thread at my output, you'll see in one case where the Rostock sent a very garbled message, which totally looks like a buffer overflow situation (or at least a non-cleared buffer). So that's not a bad theory.

while USB is attached, and connected, it's still constantly communicating with the Octoprint, sending the 'SD printin byte' commands, and temperature reports at times.

I will also add, that the models that this started happening to me on, also happen to be very complicated models. IE: When I was printing single extrusion AFTER upgraded to the Y and it was working, it was over USB via S3D ... and I had zero issues.

But those, while sometimes LARGE prints (12"x12"x12") ... were very simple prints. Lots of straight lines, box shapes, etc. So the GCODE was actually very small.

When I started seeing hte problems was not just when I switched to Octoprint, but also I started printing wargaming terrain for an upcoming con that I'm running. And now these was things like 'stone walls' where there isn't a single flat surface. Looking at the Gcode, it's HUGE, as it's a constant mess of "go 0.1mm this direction, now 0.1mm that direction, now 0.1mm that directino' ... to get the 'rippling' effect of stone.

Also FWIW: The two I've had that switched while printing on the SD card. Both did it at an obvious point of 'change' in the model. That may just be coincidence. But basically one was between two layers of brick perfectly. The other was right before a line of skull/wings happened, again, perfectly. May just be coincidence.

@teejaydub
Copy link

That makes sense - the printer controller (Octoprint for you, MatterControl for me) throttles what it sends to the printer firmware, but it's doing it heuristically, not with any kind of flow control. When there's more data per layer (or per mm extruded, or whatever it's looking at), the firmware has more of a chance of getting behind, and when that buffer fills up, something bad happens. Reducing the baud rate acts as another throttling heuristic.

Looking through the firmware source, it looks like G-code commands are buffered, and if the buffer is full, it stops reading from the serial port - see the beginning of GCode::readFromSerial(). I wonder if, when it's full for too long, the Arduino serial buffer fills up because it's not getting polled, and then when we resume reading, either the Arduino has wrapped to the beginning of the serial buffer, or ignored incoming characters from the end - either way, commands get garbled. And a single "move" command getting garbled wouldn't produce much of a noticeable problem, but a "change tool" is more noticeable.

In other words, they've handled an overflow condition in the parsed G-code command buffer, but while waiting for that condition to clear, the Arduino serial buffer overflows, and then there's no provision to resync, resulting in serial data getting dropped mid-command.

So maybe instead of "if the command buffer is full, do nothing," that first check should be more like:

  1. if the command buffer is full, set a new inputOverflow flag, and also set the existing waitUntilAllCommandsAreParsed flag (which really means "wait until all commands are executed").
  2. waitUntilAllCommandsAreParsed will then drain commands without processing input
  3. when the command buffer is empty, waitUntilAllCommandsAreParsed is already cleared; additionally, at that time, if inputOverflow is set, call requestResend(), which resets serial input completely and resyncs.

I'll try this when I can, but anyone else is welcome to beat me to it!

teejaydub added a commit to teejaydub/Firmware that referenced this issue Mar 2, 2019
…ommands, and stop reading serial input, when we come back we resynchronize with serial to ensure nothing got missed. Might address seemecnc#16.
@teejaydub
Copy link

I implemented that inputOverflow flag in my fork: https://github.com/teejaydub/Firmware. It compiles, but my printer's hot end is out of commission so I can't stress-test it. If anyone would like to give it a whirl, let me know how it goes!

@teejaydub
Copy link

Have now smoke-tested that fork (with a change in Configuration.h to define NUM_EXTRUDER to 2 - keeping it mergeable) and it seems like I didn't break anything. It'll be hard to tell if it solves the problem until more people have tried it for more time and big prints.

@EliW
Copy link

EliW commented Apr 8, 2019

Hey @teejaydub -- Haven't had a chance to test your fork yet. But I can say that I have another interesting anecdote. Been printing a bunch of random stuff without issue. Octopi was still plugged in via USB, but disconnected (because it keeps dropping connection anyway, and since I can't directly use it right now anyway shrug) ...

I ended up getting lazy this morning, didn't want to walk upstairs to take a SD card to the printer. So I connected to octopi, uploaded a file to the SD card. Then told Octopi to tell the printer to print via the SD card.

Had a tool change literally in the first layer, after like a month of printing without issue going purely SD card.

@teejaydub
Copy link

That does seem weirdly similar to what you were doing! If the problem is a buffer overrun, anything that makes it send data faster could trigger it... but yeah, I don't know why that would.

I've made a branch at https://github.com/teejaydub/Firmware/tree/dual-extrusion that includes my changes, dejaybe87's buffer-related fixes, and a NUM_EXTRUDER = 2 in Configuration.h - just to make it easier to pull and go.

@teejaydub
Copy link

I haven't seen this problem at all since installing that branch, including a flawless 8-hour two-color print last night. It looks fixed to me. Of course I'll post if I see evidence otherwise.

@EliW
Copy link

EliW commented Oct 13, 2019

@teejaydub I just realized that 6 months flew past while I was "I should try that" ...

How is the branch working on your machine? I may finally in a week or two be in a place I could afford to try installing it and see how it works for me.

@EliW
Copy link

EliW commented Oct 13, 2019

Also, looking at your codebase, I see that you not only have the fixed for the buffer issue ... but also disabled the SD card completely. Do you feel that's necessary as well? It would be nice to be able to do both.

@teejaydub
Copy link

teejaydub commented Oct 13, 2019 via email

@EliW
Copy link

EliW commented Oct 14, 2019

Of course. This morning my printer had a def/def issue. Again. So now I gotta debug that first.

@EliW
Copy link

EliW commented Oct 29, 2019

Hey @teejaydub - Happy to take this off thread if you want (and/or if we annoy others). But it seems in trying to test this ... I've bricked myself. And not sure how to continue at the moment because ... stuff. Here's the short recap:

  1. Realized that I didn't have the arduino&matterhacker installed on current macbook. Went to do so.

  2. Had to install Arduino v1.8 (instructions say to use 1.6) and Matterhacker 2 (instructions say 1.6). Because the 1.6 versions now won't run on recent mac Catalina. They give you the "This software needs updated"

  3. I launched Arduino, was reminding myself about the process. Decided to test it by running the example EEPROM_get script. It ran successfully.

  4. I then was trying to launch matterhacker so that I could backup my EEPROM settings. It couldn't connect to the machine. So I tried turning the machine off/on. Came up bricked (solid lines on LCD). Tried turning it off/on a few times, same thing. Matterhacker would still not connect.

  5. No worries, guess that _get did more than get. I'll just go through the process. Ran the EEPROM_reset. Said it worked.

  6. Cool, went to compile/install the firmware. Tons of errors and a fail to compile. I thought, oh, guess I eally needed 1.6

  7. Downloaded the 'newest java' version of 1.6 to see if it worked. It failed, cause, needed java and Catalina install seemed to blow mine away.

  8. Installed the JRE. Went to run that version of 6. It worked, but failed with a "Wrong CPU" error. I looked into that and it seemed to bring up issues of Xcode. I tried to launch xcode and it prompted me to install the latest components. But that kept failing. More research told me that Catalina Xcode had some cert issues. I needed to install the update from App Store first.

  9. Install update in app store. (2 hours later)

  10. Same issues.

So I can't get anything to compile right now, and my existing firmware is borked.

When I try to use the Arduino 1.6 standalone ... Catalina says no.

When I try to use the Arduino 1.6 Java ... I get this CPU error:
Cannot run program "/Applications/Arduino 1.6.app/Contents/Java/hardware/tools/avr/bin/avr-g++": error=86, Bad CPU type in executable

And when I try to use Arduino 1.8 ... I get tons of errors during compile. And then this is the final set of errors when it fails:

Contents/Java/hardware/tools/avr/bin/avr-gcc-ar rcs /var/folders/qd/s69542js35zcsmdzy092_z8r0000gn/T/arduino_build_52873/core/core.a /var/folders/qd/s69542js35zcsmdzy092_z8r0000gn/T/arduino_build_52873/core/new.cpp.o
Linking everything together...
/Applications/Arduino.app/Contents/Java/hardware/tools/avr/bin/avr-gcc -Os -g -flto -fuse-linker-plugin -Wl,--gc-sections,--relax -mmcu=atmega2560 -o /var/folders/qd/s69542js35zcsmdzy092_z8r0000gn/T/arduino_build_52873/Repetier.ino.elf /var/folders/qd/s69542js35zcsmdzy092_z8r0000gn/T/arduino_build_52873/sketch/Commands.cpp.o /var/folders/qd/s69542js35zcsmdzy092_z8r0000gn/T/arduino_build_52873/sketch/Communication.cpp.o /var/folders/qd/s69542js35zcsmdzy092_z8r0000gn/T/arduino_build_52873/sketch/Eeprom.cpp.o /var/folders/qd/s69542js35zcsmdzy092_z8r0000gn/T/arduino_build_52873/sketch/Extruder.cpp.o /var/folders/qd/s69542js35zcsmdzy092_z8r0000gn/T/arduino_build_52873/sketch/HAL.cpp.o /var/folders/qd/s69542js35zcsmdzy092_z8r0000gn/T/arduino_build_52873/sketch/Printer.cpp.o /var/folders/qd/s69542js35zcsmdzy092_z8r0000gn/T/arduino_build_52873/sketch/Repetier.ino.cpp.o /var/folders/qd/s69542js35zcsmdzy092_z8r0000gn/T/arduino_build_52873/sketch/SDCard.cpp.o /var/folders/qd/s69542js35zcsmdzy092_z8r0000gn/T/arduino_build_52873/sketch/SdFat.cpp.o /var/folders/qd/s69542js35zcsmdzy092_z8r0000gn/T/arduino_build_52873/sketch/gcode.cpp.o /var/folders/qd/s69542js35zcsmdzy092_z8r0000gn/T/arduino_build_52873/sketch/motion.cpp.o /var/folders/qd/s69542js35zcsmdzy092_z8r0000gn/T/arduino_build_52873/sketch/ui.cpp.o /var/folders/qd/s69542js35zcsmdzy092_z8r0000gn/T/arduino_build_52873/libraries/SPI/SPI.cpp.o /var/folders/qd/s69542js35zcsmdzy092_z8r0000gn/T/arduino_build_52873/libraries/Wire/Wire.cpp.o /var/folders/qd/s69542js35zcsmdzy092_z8r0000gn/T/arduino_build_52873/libraries/Wire/utility/twi.c.o /var/folders/qd/s69542js35zcsmdzy092_z8r0000gn/T/arduino_build_52873/core/core.a -L/var/folders/qd/s69542js35zcsmdzy092_z8r0000gn/T/arduino_build_52873 -lm
/var/folders/qd/s69542js35zcsmdzy092_z8r0000gn/T//cchP5hXY.ltrans0.ltrans.o: In function `setTimer(unsigned long)':
<artificial>:(.text+0x364): undefined reference to `stepperWait'
<artificial>:(.text+0x368): undefined reference to `stepperWait'
<artificial>:(.text+0x36c): undefined reference to `stepperWait'
<artificial>:(.text+0x39c): undefined reference to `stepperWait'
<artificial>:(.text+0x3a0): undefined reference to `stepperWait'
/var/folders/qd/s69542js35zcsmdzy092_z8r0000gn/T//cchP5hXY.ltrans0.ltrans.o:<artificial>:(.text+0x3a4): more undefined references to `stepperWait' follow
collect2: error: ld returned 1 exit status
Multiple libraries were found for "SPI.h"
 Used: /Applications/Arduino.app/Contents/Java/hardware/arduino/avr/libraries/SPI
Multiple libraries were found for "Wire.h"
 Used: /Applications/Arduino.app/Contents/Java/hardware/arduino/avr/libraries/Wire
Using library SPI at version 1.0 in folder: /Applications/Arduino.app/Contents/Java/hardware/arduino/avr/libraries/SPI 
Using library Wire at version 1.0 in folder: /Applications/Arduino.app/Contents/Java/hardware/arduino/avr/libraries/Wire 
exit status 1
Error compiling for board Arduino/Genuino Mega or Mega 2560.

Any ideas for me? I seem to have gotten myself into a rough spot, given that I can't compile the firmware, and yet I blew away the existing firmware.

@teejaydub
Copy link

Holy application stack catastrophe! I'm sorry for your loss.

I'm willing to help, but I can't find any reference for how to upload new firmware from a binary image - that is, without compiling it first. And this code base doesn't compile on the newer Arduino (I also got some error, yours looks familiar). So: no Arduino 1.6 = no firmware update.

Maybe SeeMeCNC Support or the forum has a better method for getting your machine back to where it was? I've done it for other Arduino-ish programs, but I think the available tools are specific to the processor.

Another wacky (and probably time-consuming) option would be to install an earlier version of MacOS in a VM??!

@EliW
Copy link

EliW commented Oct 30, 2019

If you google about the 'error=86, Bad CPU type in executable' you find it mentioned ... a lot ... after the Catalina upgrade on different projects. Hopefully it's something that can be rectified by either Apple or Sun/Oracle/Java ... But I'm worried that it's a "Nope, use newer shit".

HOWEVER, You did just give me a facepalm ... because I totally have some really old macbooks laying around here. And I'm not sure if my wife has upgraded to Catalina yet on hers. (If not, I'm going to be all DON'T DO IT YET!) ...

Plus I've got some windows tablets I use for conference registration desks that I could probably use.

ANNNNND, heck. I could probably just compile it on the Raspberry Pi I have setup as the Octopi and do it that way couldn't I? Since the Arduino IDE comes for Linux as well.

OOOOR I could boot into my windows partition on my laptop (that I use just for playing some games), and do it there!

#sigh - I spent too much time trying to fix my laptop, versus just moving to another computer. (#ADHD hyper-focus can suck)

OK, not sure if tonight, or later this week now. But I'm going to get this installed and test it out.

@EliW
Copy link

EliW commented Nov 11, 2019

@teejaydub - OK! I got my printer back up and running. (Mac didn't work ... Raspberry Pi couldn't talk to the printer ... Finally moved to a windows tablet, and it worked ...)

Couple things I'd love some input on while I'm testing this branch/fix for you:

  1. When compiling, I had some errors (mostly re-defined defines) ... that normal?

  2. Question: Does this clearing of EEPROM and recompiling firmware ... would that have blown away my stepper calibration? Anything else?

  3. So, I'm currently running a print through Octopi ... and it seems to be working. BUT ... it's doing a TON of retransmission and checksum erors

..... and now it just failed .... I'm going to attach a section of the communication log here: It finally failed with:
Recv: Error:WronN4759 G1um

brokenlog.txt

@EliW
Copy link

EliW commented Nov 11, 2019

And, another update. I rebooted my octopi ... disabled the xserver I'd installed to use the IDE on it ... so it was back to 'normal', and tried to print again. This time it failed QUICKLY. Including a snippet from the end of the log on that one here as well.

But early in the print, it gave me this 'warning' as well:
"Configuration timeout while printing, trying to trigger response from printer. Configure long running commands or increase communication timeout if that happens regularly on specific commands or long moves"

Any ideas what's up? BTW, I was using your dual-extrusion branch.

shorterprint.txt

@EliW
Copy link

EliW commented Nov 11, 2019

I mean:

  1. It's having some huge communication issue now, constantly asking for lines to be resent and failing checksums.

  2. the final failure point seems to be some memory pointer issue. Where the error code and the next command get blasted on top of each other.

@teejaydub
Copy link

Wow, yeah, it seems like there are a lot of potential issues there!

If I were you, I would get the plain vanilla firmware compiling and loaded and running first - step back from trying this change. That's at least one variable you can eliminate for starters.

Then if that doesn't work, I'd say take it over to the SeeMeCNC forums and/or their tech support directly.

If it does work, we can look more at figuring out what my branch is breaking for you.

@EliW
Copy link

EliW commented Nov 11, 2019

Good idea

@EliW
Copy link

EliW commented Nov 11, 2019

Random question(s) before I get deep ...

  1. Is it normal to have compilation errors (even with the base code?) (but non-fatal ones) ... I don't remember from 'the first time'. As a programmer, that always concerns me, and I'd assume that those are real code issues there, that someone should be fixing. Not 'left like that'. But, that's me :)
  2. Does the 1.6.x matter? I was using 1.6.latest (13 I believe). But should I be going back to 1.6.0?

As a side topic, I do worry that I might not 100% be able to detect stuff right here. Just because I'm seeing these issues when communicating over USB. I haven't tried printing on SD Card on this build. But whether it works, or doesn't, becomes someone irrelevant. Since it's over USB that I can see the errors. (I wish there was a way to get a SD card saved printing log).

And going back to the original firmware, means that I'll see issues over USB anyway, cause that was the problem.

However, they weren't "this bad". With the constant resends. So that's a good test in general.
Eli

@EliW
Copy link

EliW commented Nov 11, 2019

Original firmware compiled.

Still had lots of errors ;)

Currently attempting a print via USB to watch the responses ... getting a lot of these still:

Recv: T:22.47 /0 B:42.53 /60 B@:255 @:0 T0:22.47 /0 @0:0 T1:22.47 /0 @1:0
Communication timeout while printing, trying to trigger response from printer. Configure long running commands or increase communication timeout if that happens regularly on specific commands or long moves.
Send: N9 M105*46
Recv: T:22.47 /0 B:42.72 /60 B@:255 @:0 T0:22.47 /0 @0:0 T1:22.47 /0 @1:0

Print has started, and unlike with your branch -- I'm not getting non-stop resend requests. And it appears that the print is going to work without failure.

That being said ... it's printing over USB. So there is a non-zero chance of the original bug hitting and stopping my print.

But yes @teejaydub - It seems to be something about your dual-extrusion branch that is causing the communication issues (when it's trying to fix them grin)

@EliW
Copy link

EliW commented Nov 11, 2019

Been looking at the code. It does seem that 'something' is happening in your code, on my machine, that is causing your line 350 in gcode.cpp to fire constantly, for almost every line.

As if the buffer was constantly full.

(Alternatively, something is triggering one of the lower requestResend()'s in the main code.)

A couple things I noticed, that may or may not be related:

  1. It seems like memory is very tight. Your version compiled (with SD card support) gives a warning that only 744 bytes are available for local vars on the Arduino. The regular version says 745 (oooh, 1 byte!). I assume that this is why you had been disabling the SD card support. Compiling without SD Card support opens it up to 1519 bytes of memory. (I have not tested your code running without that, partially because right now I'm waiting for the print on the original to finish that I started. And partially because I'd hate to completely lose SD card as an option. Also, 1 byte really (really) shouldn't be a huge issue.

  2. Looking at the code (I admit my cpp is rusty, very) ... I'm not sure I grok how the resend works. It appears that when you call the requestResend() as you do, it just says "What was the last line number I ran. Oh right, X ... Ask for X+1 to be sent to me." It doesn't appear to have a "But what was the last number I was sent, and request up to that number? Since the resend only sends that one line. I guess there is code in there (line 195) that perhaps keeps sending them. IE: If the last you ran was 100, but the connection is trying to send you 105. That line would go 'nope', and it would request 101 ... then (I assume, somehow) it would go back to seeing that 105 was sent to it. And go nope: 102 please. But again, can't quite grok the code to ensure that's exactly what it's trying to do there. Seems a bit weird that it would fall back to this odd loop, instead of having a section that goes "Oh hey, the printer sent me 105, but I need 101, 102, 103, 104, let's ask for those in sequence?

Anyway, looking forward to hearing back. Also, I'm wondering if the un-merged buffer overlow bug here as a pull request might be having anything to do with these issues...

@teejaydub
Copy link

Yes, that behavior doesn't surprise me. The dual-extruder code requires a duplicate of the whole structure for the second extruder, which might make it just run out of heap space at run time (they are dynamically allocated, if I remember correctly).

So yeah - I didn't try to get it to work with the SD card support, but I doubt you can do dual and SD on this machine. Just not enough memory on the microcontroller. :(

I believe I merged the un-merged buffer overflow bug into my patch, so that shouldn't be an issue.

I wonder if reducing the connection baud rate would help diagnose or otherwise provide useful info?

@EliW
Copy link

EliW commented Nov 13, 2019

But literally upon compiling, it's a 1 byte difference between your code and the base code. Because all you are doing is keep an extra flag. Then deciding to stop reading from the port when full, and requesting a resend when done.

I can test it without the SD card code in it.

I'm also tempted to debug it some by putting in some responses across the USB to see which code is triggering the resend.

@teejaydub
Copy link

But literally upon compiling, it's a 1 byte difference between your code and the base code. Because all you are doing is keep an extra flag. Then deciding to stop reading from the port when full, and requesting a resend when done.

Yes, that's true about my additional code. I meant the dual-extruder mode - I was never able to get that to compile and run with SD card support at the same time. But there's plenty of options for failure, so it might have been something else going wrong when I tried that!

I can test it without the SD card code in it.

It would at least be an interesting diagnostic, even if you really want the SD support.

I'm also tempted to debug it some by putting in some responses across the USB to see which code is triggering the resend.

That might be interesting too. One other thing to consider is whether your USB cable is flaky... I'm just trying to think of what else could cause that many transmission errors. Lowering baud rate too.

@EliW
Copy link

EliW commented Nov 14, 2019

Yes, that's true about my additional code. I meant the dual-extruder mode

Sure, but the dual-extruder mode works perfectly fine (other than the random tool changes & restarts if connected to USB). The fact that adding your '1 byte of code', suddenly made it make constant requests for resends. Seems 'odd'/'buggy' there.

(without SD code)

Yeah, hopefully tonight (EDT here) I'm going to sit down at the printer and just try a bunch of stuff. Try without the SD code like you have it running for a quick start.

(extra responses)

Yeah, that's the part that honestly is most interesting to me right now. Because looking at the readFromSerial() ... it's easy to 'assume' that with your code that line 350 which you added is causing the continual resend requests. BUT, it might not be. There may be a cascade effect, as lines 361, 406, 428, 440 ... all also can request a resend. So there's a chance that something 'else' about the code is cascading down.

So, I'll set playing with that, and see what I can find.

(baud rate)

BTW, you've mentioned this a few times. But I'm wondering what real benefit there might be. Sure, if there's a race condition happening, this might let the printer 'breath'. But that doesn't really help solve the real issue.

(USB cable)

Waaaaay back when, this was one of the first things someone suggested when I had OctoPi start having issues. I tried a number of different ones back then, and no change. So I'm going to 'assume' that it didn't go bad since then. But one never knows.

@EliW
Copy link

EliW commented Nov 18, 2019

@teejaydub - OK, I have stuff for you to look at. (PS. I tried 3 USB cables, including buying a brand new one. Same issues. It's not the cable)

First of all, you can see my special debugging branch, and specifically the gcode.cpp here to see where the debugging messages are being generated:
https://github.com/EliW/Firmware/blob/debug/Repetier%20Firmware/Repetier/gcode.cpp

Now, you can look at the output of the serial log of a print that I did, using your code with SD support off, with the crazy amount of resends (which the stock version doesn't have that issue):
https://gist.github.com/EliW/0b1fc7a1f6a3f6833123a8ad3005b17d

OK. So you can let me know what's up here. But here is what I start seeing:

Line: Comment
25: As soon as N9 comes down ... it starts being out of buffer.
560: Ready to start up again, asks for N9 ... gets it, N10 is send, and woah, out of buffer again! (? only can buffer 1 line of code?)
641: Fine for a while, then runs out of buffer - From here on things get consistant with it running out of buffer requesting resend, then quickly running out of buffer again after just 1-2 lines
2648: First bad checksum & resend from the parser vs your code. After that it starts sending a lot of skip commands while reading more...
2710: This is the first case of what appears to be a buffer overflow error. Garbage in the output.
2921: Another overwritten line. I guess these could be an octopi logging bug. But these are what start to happen with the filament change happens as well in the original code
2940: This section is interesting. Because it claims a format error on N52, and then wrong checksum twice on resends, then finally accepts it. All 4 times, the exact same thing is sent. Then lower down it says 'Skip 52' a bunch of times in a row, when Octopi isn't trying to send N52 at all anymore
2987: Garbage
3151: Garbage (stopping mentioning these now)

Anyway ... Lemme know if this helps at all.

I'm not sure hwo to parse it because your code change is so minor, but has such a big negative effect here. Causing all these resend requests that were officially not needed.

One interesting point is that my EDEBUG 345 never fires. Which is supposed to be after the buffer was full, it waiting until the buffer is completely empty and going ... wait for it ... wait for it ... But that never happens. It always goes from buffer overload, to buffer empty.

......

Hrmmm, a couple interesting updates as I just went deeper in the code.

For one thing, it appears that the default buffer length ... is 2. Yup. 2. So it only ever allows it to have 1 command it's working on, and 1 extra. But that's the same in the stock version. And the stock doesn't constantly lose lines. So I'm not sure still why yours constantly panics about the buffer being full. And the original seems to only ever have 2 things int he buffer, and yes, it's full. But it shrugs and moves along just fine. Until it doesn't and the error happens.

Basically as far as I can tell, the way that the original works. Is that it has a buffer of 2, so that it always has a line it's working on. And then a line that it has queued up, so it doesn't have to wait. When it's finished one line ... one thread is going "OK, start parsing that next line". And another thread is going "OK, let's read that next line from serial now."

After it reads that line and puts it in the buffer. It's "yup, buffer full" And it simply just ignores anything from Octopi. Doesn't read it. And so Octopi should be sitting there patiently waiting for it to be read and not moving on, because there has been no ack (Ok). It will only send the next line when it gets that ack.

So technically.... that feels like how it should work. And that shouldn't cause any issues. The issues would only happen if somehow Octopi kept sending requests without an ack. And the original code has that taken into account. With the original resend code that if it gets a number that doesn't match what it should have gotten. It goes 'wait wait ... I was expecting X, not X+3'.

So yes. I think I can see where you code in fact causes tons of resend requests. Because the original code is meant to sit with a full buffer almost all the time, and just not respond to the serial stuff until it's ready. But your code now, constantly makes it trigger a resend ... for something it already had, just because the buffer got full at one point?

If I grok this:

Original code:
Pi: N52
Max: Ok (buffer)
Pi: N53
Max: Ok (buffer)
Pi: N54
.... Max doesn't read that, or respond, until N52 is done ...
.... Once it's done executing N52....
Max: Ok
Pi: N55
...

Your code:
Pi: N52
Max: Ok
Pi: N53
Max: Ok
Pi: N54
... Max now sits there waiting, just like before ... But executes N52, and then N53 ... then
Max: Resend 53 (because your resend request happens before it reads the N54 on the serial port)
Max: Ok (because it just read the N54)
Pi: N53 again
Max: Skip 53 (Why did you send that? I'm parsing N54)
... etc ...

-- I may be totally off there. Just trying to work through this code myself. And if I understand it. I'm not sure if your code helps in any way. Though you said that it did for you...

@EliW
Copy link

EliW commented Nov 18, 2019

I found s solution for this a while back. Cut the data rate in half. Simply reducing the baud rate in octoprint to 115000 or whatever it is, made the problem go away entirely.

@HippoDan -- Mind giving me some advice here? I wanted to take your approach and cut the baud rate.

But how?

If I just change the baudrate in Octopi ... it fails to connect at anything except 250000

I went into the firmward code, and changed the baudrate there to 115200

But Octopi still will only connect at 250000

How did you get it to connect at a different speeed?

@EliW
Copy link

EliW commented Nov 18, 2019

@HippoDan Nevermind ... figured it out. Seems that you have to clear the EEPROM before uploading, if you want some of the config things like baudrate to stick. (Or theoretically change the EEPROM_MODE to another number.)

Talking at 115200 now. Will see how that goes. Also I have the array overflow bug merged in.

@teejaydub
Copy link

Thanks, that's a lot of good work!

So, here's what I'm sure of:

  1. I was having the problem of random switches to the second extruder during printing, sometimes.
  2. In the what, 5 months since I started using this patch, I've been printing a lot (and sometimes on aggressive deadlines, which as you know manipulate the quantum field in certain statistical bug-revealing ways!) and I haven't had the problem.
  3. I'm using MatterControl and you're using OctoPrint; maybe the two are doing something different on the sending side?

And here's my read of the code and your situation:

  1. Repetier's buffering approach says "Read from the serial port often, except if our buffer of commands is full, in which case stop reading from the serial port until we've processed those commands." (if(bufferLength >= GCODE_BUFFER_SIZE) return; // all buffers full)
  2. That buffer seems to be full after holding one command (#define GCODE_BUFFER_SIZE = 1).
  3. If you don't read the serial port "often enough," it overflows and characters will be lost. (I guess I'm not 100% positive of this, but I think I verified that when I was working on this before, and it has to be true for some value of "often enough" with a finite amount of input buffer.)
  4. Lost characters could result in the problem behavior.
  5. You're absolutely right, the new behavior is admittedly chatty, but I don't see any other easy option given the memory constraints. Increasing GCODE_BUFFER_SIZE might reduce the chattiness, but one GCode command structure is what, maybe 128 bytes for every command no matter how simple, so there's not a lot of room to store more.

Now, maybe all the re-sending is causing other stress, and there could be other ways for the buffer to leak. And there might be a simpler approach to deal with backlog on the serial port. But I think the core problem is how to keep up with input on the serial port under tight memory constraints, and the existing solution seems to be to rely on processing commands in time, which at some point is going to fail if the client is sending data fast enough.

Yet another way to think of it: all those EDEBUG 337: Command Buffer Full lines in your log represent times when the Max will drop characters if the serial buffer overflows, and the longer the runs of them, the greater the chance that'll happen. (Lowering the baud rate imposes an artificial speed limit that decreases the chances, but it's still probabilistic code.)

That's all I've got for the moment... I think it would be interesting to know if there's any difference with MatterControl, though that may not be a solution for you.

Wait, one more thought... Max currently sends ok as soon as it has parsed a command. If it delayed the ok response for the last command in the buffer until there was room for more, OctoPi would presumably not get ahead of the Max's command buffer size. That might reduce the chatter and re-sends while still dealing definitively with the input serial buffer.

@EliW
Copy link

EliW commented Dec 16, 2019

So, just an update here ... I recompiled back to base code. And then turned my speed down a notch.

Also, I went into Octopi and turned up the 'long command' wait and a few others. So that Octopi wouldn't go "ugh, that took too long" and fail.

Have had zero issues since then.

I think the problem may not be anything really to do with the dual extruder anyway, or buffering (since the buffer code looks correct in general, it only buffers 1 one command while it's running, and reads another one so it's ready.) ... and then halts.

I think the problem is just a combination of so many commands at speed hitting the machine, combined with Octopi aggressively trying to decide that "Ya know, I've waited 3 seconds for that last command to finish. That's preposterous, I'm moving on" .... when sometimes a long slow extrude can take a while, or more importantly, a filament change can.

Anyway, the combo of slowing down the baud rate, and telling octopi to be more patient. Fixed it.

May also explain why I never had issues with S3D direct. Because, perhaps, it doesn't have nearly the aggressive timeouts set.

@teejaydub
Copy link

teejaydub commented Dec 17, 2019 via email

@EliW
Copy link

EliW commented Jan 21, 2020

@teejaydub - So, an update. Been printing fine now for quite a while. Until:

Just recently went to do a dual print. And on the 3rd or 4th tool change, I'm getting an Apos X Steps (or sometimes Y Steps) error. Tried different slices, different models. And now it's consistently happening. Tried pushing fresh firmware, still happens, USB or SD Card.

Is your v3 Dual up and running? Mind trying one of my slices if so to see if you have the same issue? Or if something is going wrong with my machine?

@teejaydub
Copy link

Hm, yup, that sure looks weird! Sure, I could try a gcode file.

@EliW
Copy link

EliW commented Jan 21, 2020

Here's a link to a file: https://d.pr/f/Xr3HBY

It's only a small 15 minute print, sliced for 0.5mm nozzle, 0.3mm layers (quick), and at 200C for PLA

@EliW
Copy link

EliW commented Jan 24, 2020

@teejaydub - Looks like the print is working now. After working with the folks in the Repetier github on the problem. It all came down to buffer overflows with a complicated model.

I did your trick, and recompiled to remove SD Card Support. Freeing up a lot of RAM. And now the print works.

@teejaydub
Copy link

Ohh, that's interesting! I'm glad it's working for you. And sorry for the delay - I was going to try it tonight, honest. 😅

@Baenwort
Copy link

So as a new duel extruder, v3, and OctoPrint user I'm trying to follow this and avoid what seems to be a known problem combo. I'm not using a Pi but rather a old 4C Haswell PC running Ubuntu.

Do I have your process understood?
1)Turn down baud rate in OctoPrint to 115200
2)Turn down baud rate on the LCD to 115200 and save to EEPROM
3)Turn up Timeouts and Command Wait timers (I'm doubling them) in OctoPrint

I'm comfortable flashing pre-compiled firmware but I don't see any that aren't several years old. There are several mentions in this thread about customizing the firmware but not enough info for me to feel able to follow.

@Baenwort
Copy link

Baenwort commented Feb 9, 2020

@teejaydub - Looks like the print is working now. After working with the folks in the Repetier github on the problem. It all came down to buffer overflows with a complicated model.

I did your trick, and recompiled to remove SD Card Support. Freeing up a lot of RAM. And now the print works.

So I tried the baud rate and Octoprint settings and it is still having problems. @EliW would you be willing to share the duel extruder, no SD card firmware with me? I think I can follow the SeeMeCNC flashing instructions if I had somewhere other than their repo to aim at.

@EliW
Copy link

EliW commented Feb 10, 2020

@Baenwort Hey there! :)

So yes. You have the process down ... with the exception (and last point I got to) of needing to just straight up remove SD Card support. Which fixed the issue. (You can see more info here: repetier/Repetier-Firmware#903)

Happy to share the firmware however:

  1. I honestly have no idea how to flash the firmware without the compiler (and previous discussions seemed to imply that was the only way).

  2. I'll have to commit my own changes that I'm running up to my Github to give you the code. But I'm super busy right now and that code is on an old windows box I have shut down now. So it will take me a few days to get around to it.

However:

I think @teejaydub might still have a branch of the code up with the 'no SD card' fix in place ... if he can link you to that quicker than I can get my code up.

@PartDaddy
Copy link
Collaborator

PartDaddy commented Feb 10, 2020 via email

@Baenwort
Copy link

@teejaydub and @EliW

From looking at https://seemecnc.dozuki.com/Guide/RAMBo+Control+Firmware/50 I think I can follow along. I just need directions to somewhere other then the SeeMeCNC repo? I'm happy to take which ever of you link their repo first. ;)

It does sound like PartDaddy would accept your pull request if it is made universal. Browsing around EliW's repos I see that at one point he had a 'if two extrudera turn off SD card's version but I'm not confident enough to just grab that or to try to build my own.

My level of coding skills ends at game mods and AV stuff. Nothing can physically break or burn down when I fail. ;) this is a level of stakes I'd rather follow another down.

@teejaydub
Copy link

@Baenwort , you should be able to get your source from the branch I mentioned way up above:

I've made a branch at https://github.com/teejaydub/Firmware/tree/dual-extrusion that includes my changes, dejaybe87's buffer-related fixes, and a NUM_EXTRUDER = 2 in Configuration.h - just to make it easier to pull and go.

That branch also has SD card support removed.

It does seem like a good idea to integrate this back into the mainline for the dual-extruder case. I'd be happy to make that change - so that dual-extruder implies no SD support. I won't have time to make it and test it in the next couple of weeks at least, though.

@Baenwort
Copy link

Baenwort commented Mar 7, 2020

@Baenwort , you should be able to get your source from the branch I mentioned way up above:

I've made a branch at https://github.com/teejaydub/Firmware/tree/dual-extrusion that includes my changes, dejaybe87's buffer-related fixes, and a NUM_EXTRUDER = 2 in Configuration.h - just to make it easier to pull and go.

That branch also has SD card support removed.

It does seem like a good idea to integrate this back into the mainline for the dual-extruder case. I'd be happy to make that change - so that dual-extruder implies no SD support. I won't have time to make it and test it in the next couple of weeks at least, though.

The branch you linked, does it include that resending code that was tried for a while up above?

I've tried the dual-extrusion one you linked and I'm getting messages in my serial terminal I'm not used to seeing about resending lines very frequently.

@Baenwort
Copy link

Baenwort commented Mar 7, 2020

And now the first two prints I tried failed with this error in Octoprint:: "Your printer's firmware reported an error. Due to that the ongoing print job will be cancelled. Reported error: Wrong chN21046"

serial0307fail.log
octoprint0307fail.log
clip4ABSgcode.txt

Video: https://youtu.be/AMVrcJu1wf4

@EliW @teejaydub Do either of you have a version that has removed the SD card support and doesn't include the buffer checking code?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants