Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize breaking up data for encoding method in Code128 dynamic mode #127

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

Badkoubehei
Copy link

In order to optimize the encoded data length, only numeric sequences with length 3 or more must be encoded in mode C. Shorter sequences must be encoded in mode A or B. In addition, if the sequence length is odd, it is more optimize to encode the first number in mode A or B (not the last number. Because encoding the last number will need another mode switching)

@barnhill
Copy link
Owner

What do you think of this @binki ?

Encoding the first vs the last will require the same amount of switching modes unless Im missing something.

@barnhill barnhill self-assigned this May 27, 2021
@Badkoubehei
Copy link
Author

@barnhill, no, encoding the last number will need one more mode switching (between CODE_A and CODE_C). As an example, please check this input: GAA01BB0880001
Without this PR, it will encode as show in the following picture:
image
But after this PR, it will encode as the following:
image
As you see, it will use two less symbols; The first is to not encode numeric sequences having length less than 3, and the other one is to encode the first number of an odd-length numeric sequence in mode A or B (not the last).

@binki
Copy link
Collaborator

binki commented May 27, 2021

@barnhill

What do you think of this @binki ?

Encoding the first vs the last will require the same amount of switching modes unless Im missing something.

I have not yet looked at the implementation, so I am not sure if my concerns have been addressed. However, here are my thoughts! I will try to check back again (if I remember x.x).

It sounds reasonable to try to optimize the number of switches for size. This could actually help a lot with getting the barcode to fit on screen or in a document. The biggest concern I have is that there might be a compatibility concern for an application which accidentally relies on the existing behavior. For example, maybe a scanner which is configured to ignore barcodes which use mode C (not sure if that is a thing). Hopefully all people consuming barcodes work with them as strings and do not know how they are actually encoded ;-).

The last versus first seems to be an optimization for when the 3-number sequence is at the very end of the barcode. If the last three characters of a barcode are a run of digits following nondigits encoded using mode A or B, you could encode that as either “mode-C 01 mode-A 2” or you could encode that as “0 mode-C 12”.

However, if the 3-digit run is in the middle of a barcode, then I think switching codes would cost you more. You could encode “A123B” as “start-A A 1 2 3 B” (6) or “start-A A 1 mode-C 23 mode-A B” (7). Thus, if you switch for a 3-digit sequence which is followed by a switch back to mode A or B, then switching to mode C increases the total width of the barcode unnecessarily. However, if you have a run of 4 digits followed by a mode switch, then you do not increase the width of the barcode by switching modes—but you do not gain anything until you have a run of at least 6 digits:

  • Surrounded 3 digits “A123B”
    • Switching: “start-A A 1 mode-C 23 mode-A B” (7)
    • No switching: “start-A A 1 2 3 B” (6)
  • Surrounded 4 digits “A1234B”
    • Switching: “start-A A mode-C 12 34 mode-A B” (7)
    • No switching: “start-A A 1 2 3 4 B” (7)
  • Surrounded 5 digits “A12345B”
    • Switching: “start-A A 1 mode-C 23 45 mode-A B” (8)
    • No switching: “start-A A 1 2 3 4 5 B” (8)
  • Surrounded 6 digits “A123456B”
    • Switching: “start-A A mode-C 12 34 56 mode-A B” (8)
    • No switching: “start-A A 1 2 3 4 5 6 B” (9)

@Badkoubehei Please let me know if I am missing something!

@binki
Copy link
Collaborator

binki commented May 27, 2021

@Badkoubehei

As you see, it will use two less symbols; The first is to not encode numeric sequences having length less than 3, and the other one is to encode the first number of an odd-length numeric sequence in mode A or B (not the last).

After thinking about it, I think these rules would make sense and result in the most optimal mixed mode barcodes:

  • Start or end digit sequence must be at least 4 digits long (e.g., “start-C 12 34 mode-A A” (5) is cheaper than “start-A 1 2 3 4 A” (6)) and aligned to either the start or end.
  • Inner digit sequences must be at least 6 digits long.

Unfortunately, an earlier commit I made disturbed Code128.cs quite a bit, so this PR has merge conflicts will need some work x.x. If you could merge in master (or maybe it is easier to reimplement from scratch and force push?), I will review the actual code. Sorry for the hassle and thanks for the contribution!

@Badkoubehei
Copy link
Author

Badkoubehei commented May 27, 2021

@binki

* Start or end digit sequence must be at least 4 digits long (e.g., “start-C 12 34 mode-A A” (5) is cheaper than “start-A 1 2 3 4 A” (6)) and aligned to either the start or end.

Yes, you are right about the length of sequence. But if I have got your meaning about "aligned to either the start or end" right, I think aligning to start or end matters for odd-length sequences. See these examples:

12345A
Align to start => start-C 12 34 mode-A 5 A (6)
Align to end => start-A 1 mode-C 23 45 mode-A A (7)

A12345
Align to start => start-A A mode-C 12 34 mode-A 5 (7)
Align to end => start-A A 1 mode-C 23 45 (6)

It shows that if the sequence is at the start of string, aligning to start is more optimal and if sequence is at the end, aligning to end is better.

* Inner digit sequences must be at least 6 digits long.

Yes, you are right. And in this case, aligning to start or end does not matter:

A1234567A
Align to start => start-A A mode-C 12 34 56 mode-A 7 A (9)
Align to end => start-A A 1 mode-C 23 45 67 mode-A A (9)

Unfortunately, an earlier commit I made disturbed Code128.cs quite a bit, so this PR has merge conflicts will need some work x.x. If you could merge in master (or maybe it is easier to reimplement from scratch and force push?), I will review the actual code. Sorry for the hassle and thanks for the contribution!

I checked your commit. I think re-implementing my commit is easier than resolving the conflicts :). I will try to do it.

@binki
Copy link
Collaborator

binki commented May 27, 2021

@Badkoubehei

@binki

  • Start or end digit sequence must be at least 4 digits long (e.g., “start-C 12 34 mode-A A” (5) is cheaper than “start-A 1 2 3 4 A” (6)) and aligned to either the start or end.

Yes, you are right about the length of sequence. But if I have got your meaning about "aligned to either the start or end" right, I think aligning to start or end matters for odd-length sequences. See these examples:

12345A
Align to start => start-C 12 34 mode-A 5 A (6)
Align to end => start-A 1 mode-C 23 45 mode-A A (7)

A12345
Align to start => start-A A mode-C 12 34 mode-A 5 (7)
Align to end => start-A A 1 mode-C 23 45 (6)

It shows that if the sequence is at the start of string, aligning to start is more optimal and if sequence is at the end, aligning to end is better.

Correct! Aligning to the start or end only matters for odd-length sequences. I just wanted to clearly state that odd-length sequences of 5 or more digits at the beginning of the barcode need to have the barcode start in Mode C rather than switching to it after the first character. It is similar to the optimization at the end of the barcode. I wanted to make sure that this optimization was considered:-).

  • Inner digit sequences must be at least 6 digits long.

Yes, you are right. And in this case, aligning to start or end does not matter:

Correct.

I checked your commit. I think re-implementing my commit is easier than resolving the conflicts :). I will try to do it.

Thanks!

@barnhill
Copy link
Owner

This is awesome 😎. I'm pretty stoked about the collaboration here!!

@rob313663
Copy link
Contributor

rob313663 commented May 27, 2021 via email

@Badkoubehei
Copy link
Author

Badkoubehei commented May 27, 2021

@rob313663

Hi, there are Shift A and Shift B that allows switching between Code A and Code B too. Never used them myself but they are said to only shift for the next code word.

Great! I did not know about that. It may change the algorithm!

1234A23
Start-C 12 34 Code-A A 2 3 (7)
Start-C 12 34 Shift-A A 23 (6)
Here the ending sequence of numbers has length 2, but using the Shift-A makes it optimal to encode using code-C.

1234A567
Start-C 12 34 Shift-A A 56 Shift-A 7 (8)
Start-C 12 34 Code-A A 5 6 7 (8)

1234A5678
Start-C 12 34 Shift-A A 56 78 (7)
Start-C 12 34 Code-A A Code-C 56 78 (8)

12A5678
Start-C 12 Shift-A A 56 78 (6)
Start-A 1 2 A Code-C 56 78 (7)

I will try to consider it in my implementation.
@barnhill what do you think?

@rob313663
Copy link
Contributor

rob313663 commented May 27, 2021 via email

@Badkoubehei
Copy link
Author

Hi @rob313663
Thanks for sharing your information. So we can ignore optimization of the shift code.

@barnhill
Copy link
Owner

I'm interested in supporting GS1-128. It's a subset of C128 as it's formatted data encoded with C128 as far as I know.

@fiatCurrency
Copy link

I have what I believe is a much more robust method of correctly decomposing which substrings should be in subset C.

The situation is far more complex than 'must be at least 3'.

The complexity depends on something which appears not to be addressed at all: the FNC1 code.
That code can be encoded in sets A,B, or C.

So, a sequence "0000f66" in the middle of a string (where f means func 1) would be better encoded encoded in set C. Then there are odd numbers of digits which terminate at or start at a FNC1.

It isn't obvious how sequence such as X55555f777f99999f44 should be encoded.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants