Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tidy -utf interpreted as tidy -u -t -f instead of as -utf8, without error #921

Open
bryceharrington opened this issue Feb 11, 2021 · 8 comments
Milestone

Comments

@bryceharrington
Copy link

Forwarding a usability issue reported by a Ubuntu user, from https://bugs.launchpad.net/ubuntu/+source/tidy-html5/+bug/1914865

Running tidy -utf ... is treated as tidy -u -t -f ..., rather than as tidy -utf8 as the user intended.

$ echo "" | tidy > /tmp/test.html
$ tidy -utf /tmp/test.html > /dev/null
HTML Tidy: unknown option: t
HTML Tidy: unknown option: f
...
$ echo $?
0

The -u option is synonymous with -upper, producing upper-cased tags in the output, which causes unexpected results particularly if one has set "uppercase-tags: no" in ~/.tidyrc, such as in the original bug report.

It might help if tidy returned a non-zero exit code when invoked with unrecognized options. (Even better might be to exit immediately with a usage message.) Then the warnings would be more visible and the user might be able to figure out their error.

@geoffmcl
Copy link
Contributor

@bryceharrington, yes, I too would prefer that tidy exit immediately on encoutering an unknown option... like most other apps do... I have been caught by this many times...

But historically, the founders of tidy, back over 20 years ago, chose otherwise... so it has been like that forever...

But if someone were to present a patch or a PR, AND after further supportive feedback and discussion, it would certainly be considered... thanks...

@geoffmcl
Copy link
Contributor

@bryceharrington, seems I may have been far too hasty in my reply... sorry...

As I started to look at the issue more closely, I was immediately reminded of the current, perhaps quite unique, style of tidy, since its inception, and that is to process an extended command line, with multiple config/options and multiple input files... that is -

tidy [options-1] file [options-2] file2 [options-3] file3... etc...

So an exit(?) on an error in [options-1], or 2, etc... would stop the processing of subsequent options and files... quite a substantial change in tidy usage...

Such a big change may, or may not be acceptable...

And on another point you made, it is quite common to allow command line options to override anything given in a ~/.tidyrc, so I do not see anything that can be done about that...

For the power user, that builds tidy from source, it is possible to build tidy without that default ~/.tidyrc reading... see the cmake option -DENABLE_CONFIG_FILES:BOOL=OFF...

And some time ago we rejected a new option, like -no-config to not read this file, since it changed the processing order of the config files vs the command line... again a big change...

So, while I look forward to further feedback, comments, in general it seems nothing can be done, at this time, about either issue... sorry...

@geoffmcl
Copy link
Contributor

@bryceharrington, as mentioned in #681, am quite disturbed about the situation presented there...

While I still can not see drastically, dramatically, changing tidy's historic style of being able to process a set of files on the command line, I am starting to see that tidy should exit with other than 0, 1, or 2, which presently depends on the simple sum of all the files processed...

Even with no option errors, or file failures, tidy simply sums the warnings and errors, so if it does exit with 1 or 2, there is no way to tell which actual file caused this... not very helpful...

Parts of the current console tidy.c code (some code changes, re-lining and spacing removed...) -

    /* Read command line */
    while ( argc > 0 )
    {
        if (argc > 1 && argv[1][0] == '-') /* or "--" */
        {
            /* *** DEAL WITH OPTIONS - '-' or '--' */
            --argc;
            ++argv;
            continue;
        }
        if ( argc > 1 )
            htmlfil = argv[1];
            /* *** DEAL WITH A FILE */
        }
        contentErrors   += tidyErrorCount( tdoc );
        contentWarnings += tidyWarningCount( tdoc );
        accessWarnings  += tidyAccessWarningCount( tdoc ); /* note: not used! */
        --argc;
        ++argv;
        if ( argc <= 1 )
            break;
    } /* read command line loop */

And then, **after** the complete command line, and all file(s), processed -

    /* footnote printing only if errors or warnings */
    if ( contentErrors + contentWarnings > 0 )
        tidyErrorSummary(tdoc);
    /* prints the general info, if applicable */
    tidyGeneralInfo(tdoc);
    /* called to free hash tables etc. */
    tidyRelease( tdoc );

    /* return status can be used by scripts */
    if ( contentErrors > 0 )
        return 2;
    if ( contentWarnings > 0 )
        return 1;
    /* 0 signifies all is ok */
    return 0;
}

I am now thinking that tidy should likewise sum the fatal errors, that is where, usually, the status returned is less than zero, and exit with that negative value, or just -1... something like -

    /* return status can be used by scripts */
    if ( fatalErrors > 0 )
        return -1
    if ( contentErrors > 0 )
        return 2;
    if ( contentWarnings > 0 )
        return 1;
    /* 0 signifies all is ok */
    return 0;
}

And for sure, an unknown option should likewise be included in the fatal errors...

This would perpetuate the problem that, in the case of multiple files, you do not know which file caused the problem... but keep the full command line processing, before the exit...

But would really assist users who process the files one-by-one, say through a script... hopefully the majority...

I will try to find the time to code something along these lines, and present it in a branch for testing... and feedback...

Meantime, look forward to further comments, patches, ideas, ... related to this... especially those strongly for or against ... thanks...

@bryceharrington
Copy link
Author

Hi geoffmcl, thanks for continuing to mull over the issue. I sympathize with the need to balance historical compatibility when considering fixes.

And for sure, an unknown option should likewise be included in the fatal errors...

Yes, this seems like it would be a sensible improvement to tidy. The section of code where it prints the "HTML Tidy: unknown option: ..." could increment an unknownOption counter, which could be checked with the others to generate a non-zero exit code like -1.

@geoffmcl
Copy link
Contributor

geoffmcl commented Mar 2, 2021

20210226: Start issue-921 branch (D:\UTILS\tidy\tidy-921\build\temp-921)

@bryceharrington, as you suggest it is related to the code that prints HTML Tidy: unknown option: ..., but a little more... ;=))

This, in general, is the process I followed -

Documentation: See htidy-org \documentation_posts\2000-03-01-part_running.md... There may be several other portions of the docs to address...

Consider an option like -exit-on-error, but prob. with order of command line - rejected for now... and an option to revert this exit change, but again rejected it, for now...

Implementation notes:

  1. Establish a static uint errorCount = 0;, and increment if a single character option is not one of -iucgbnmeq...
  2. If a file failed - config, or html file...
  3. Options in config file - option->parser( doc, option ); value needs to be returned
    and ReportUnknownOption should increment something, that can be retrieved
    /* Update the count of each report type. /
    switch ( message->level ) {
    case TidyInfo: doc->infoMessages++; break;
    case TidyWarning: doc->warnings++; break; /
    tidyWarningCount( tdoc ); /
    case TidyConfig: doc->optionErrors++; break; /
    ND! Added!! tidyConfigErrorCount( TidyDoc tdoc ); /
    case TidyAccess: doc->accessErrors++; break; /
    NU! tidyAccessWarningCount( tdoc ); /
    case TidyError: doc->errors++; break; /
    tidyErrorCount( tdoc ); /
    case TidyBadDocument: doc->docErrors++; break; /
    NA! /
    case TidyFatal: /
    Ack! */ break;
    default: break; }
  4. Note quiet: yes suppresses some error that maybe should be shown??? Particularly TidyConfig
    which is now considered a fatal type error. fixed this - simple patch = don't kill go for this message type -
diff --git a/src/message.c b/src/message.c
index ee2e6c6..aa7f3e8 100644
--- a/src/message.c
+++ b/src/message.c
@@ -169,7 +169,7 @@ static void messageOut( TidyMessageImpl *message )
         go = go && message->code != STRING_CONTENT_LOOKS;
         go = go && message->code != STRING_NO_SYSID;
         go = go && message->level != TidyDialogueInfo;
-        go = go && message->level != TidyConfig;
+        /* go = go && message->level != TidyConfig; Is. #921 - these are errors, not informational! */
         go = go && message->level != TidyInfo;
         go = go && !(message->level >= TidyDialogueSummary &&
                             message->code != STRING_NEEDS_INTERVENTION);
  1. Hmmmm, now some option/config error are reported twice!!! fixed this - simple remove duplicate patch -
diff --git a/src/config.c b/src/config.c
index bae56b8..3dd22a0 100644
--- a/src/config.c
+++ b/src/config.c
@@ -1581,7 +1581,10 @@ Bool ParsePickList( TidyDocImpl* doc, const TidyOptionImpl* entry )
         return yes;
     }
   
-    TY_(ReportBadArgument)( doc, entry->name );
+    /* Is. #921 - this service calls 'GetParsePickListValue', which already emits
+     *  'ReportBadArgument' if it fails, so eliminate this duplicate call
+     * TY_(ReportBadArgument)( doc, entry->name );
+     */
     return no;
 }
 

Finally, maybe some error case has been missed, BUT it seems all the major bases covered! In addition to the above 2 patches, console tidy.c patch, so far...

diff --git a/console/tidy.c b/console/tidy.c
index 3a3d9ad..f5eb22a 100644
--- a/console/tidy.c
+++ b/console/tidy.c
@@ -49,6 +49,8 @@ static FILE* errout = NULL;
  ** @{
  */
 
+/** Fatal error count - like bad option, no files, etc - Exit -1 */
+static uint errorCount = 0; /* Is. #921 */
 
 /* MARK: - Miscellaneous Utilities */
 /***************************************************************************//**
@@ -2123,6 +2125,7 @@ int main( int argc, char** argv )
         if ( status != 0 ) {
             fprintf(errout, tidyLocalizedString( TC_MAIN_ERROR_LOAD_CONFIG ), TIDY_CONFIG_FILE, status);
             fprintf(errout, "\n");
+            errorCount++; /* Is. #921 - config file failed */
         }
     }
 #endif /* TIDY_CONFIG_FILE */
@@ -2133,6 +2136,7 @@ int main( int argc, char** argv )
         if ( status != 0 ) {
             fprintf(errout, tidyLocalizedString( TC_MAIN_ERROR_LOAD_CONFIG ), cfgfil, status);
             fprintf(errout, "\n");
+            errorCount++; /* Is. #921 - config file failed */
         }
     }
 #ifdef TIDY_USER_CONFIG_FILE
@@ -2142,6 +2146,7 @@ int main( int argc, char** argv )
         if ( status != 0 ) {
             fprintf(errout, tidyLocalizedString( TC_MAIN_ERROR_LOAD_CONFIG ), TIDY_USER_CONFIG_FILE, status);
             fprintf(errout, "\n");
+            errorCount++; /* Is. #921 - config file failed */
         }
     }
 #endif /* TIDY_USER_CONFIG_FILE */
@@ -2479,6 +2484,7 @@ int main( int argc, char** argv )
 
                             default:
                                 unknownOption( tdoc, c );
+                                errorCount++; /* Is. #921 - option error */
                                 break;
                         }
                     }
@@ -2506,6 +2512,8 @@ int main( int argc, char** argv )
             if ( tidyOptGetBool(tdoc, TidyEmacs) || tidyOptGetBool(tdoc, TidyShowFilename))
                 tidySetEmacsFile( tdoc, htmlfil );
             status = tidyParseFile( tdoc, htmlfil );
+            if (status < 0) /* Is. #921 - input file failed */
+                errorCount++;
         }
         else
         {
@@ -2547,6 +2555,7 @@ int main( int argc, char** argv )
         contentErrors   += tidyErrorCount( tdoc );
         contentWarnings += tidyWarningCount( tdoc );
         accessWarnings  += tidyAccessWarningCount( tdoc );
+        errorCount      += tidyConfigErrorCount( tdoc); /* Is. #921 - config error are fatal */
         
         --argc;
         ++argv;
@@ -2568,8 +2577,16 @@ int main( int argc, char** argv )
 
     /* called to free hash tables etc. */
     tidyRelease( tdoc );
-    
+
     /* return status can be used by scripts */
+
+    /* Is. #921 - one, or more fatal errors */
+    /* this can be many forms, from file does not exit, to an unknown option,
+     *  or malformed option, etc ...
+     */
+    if (errorCount > 0)   
+        return -1; /* Is. #921 */
+
     if ( contentErrors > 0 )
         return 2;
     

Regression Tests: Running the regression tests, using this tidy-5.7.45.I921.exe failed in 2 tests - 431716 and 629 - In both cases, the failure was in the first stage - comparing the exit code with an expected exit code. There was no difference in the actual outputs.

Case 431716 is a test of an unknown option, split: yes, in a config file. Previously this test would exit with a zero, and now exits with -1, but that is exactly what this issue is about - showing something is wrong in the config, so this is just a case of updating the test...

Case 629 is also a test of a config item, mute, containing an unknown value, FAKE_TAG, but the html input also causes a warning, for a proprietary attribute, thus would previously exit with 1, now with -1, for the config error, so again just a case of updating the test...

Still to do some more testing, but this is starting to look good to go!

Will get around to pushing this issue-921 branch, so others can easily test it, without having to apply the above patches...

Meantime look forward to any feedback, comments, ideas, patches, etc... thanks...

@geoffmcl
Copy link
Contributor

geoffmcl commented Mar 3, 2021

20210303: @bryceharrington, have now pushed the issue-921 branch...

Appreciated if you, and others, in various distros, could pull, and test this branch... and report...

$ cd html-tidy5 # get to cloned source dir
$ git pull # update src
$ git checkout issue-921 # get to the branch
$ cd build/cmake # get to a build directory
$ cmake ../.. [options] -DTIDY_RC_NUMBER=I921 # mark rc, and generate Makefile
$ make # build tidy

Certainly with this tidy rc, a bad option, like -utf should cause a -1 exit value... as should other file, config errors...

I recognize this is a bit of a compromise, since tidy will continue to process any input html file given... but it seems the best choice, at this time...

On the option -utf8, this should never be needed, since this is the default encoding selected by tidy... for a long time now...

Look forward to further feedback, comments, ideas, patches, testing, etc... thanks...

@geoffmcl
Copy link
Contributor

Created PR #931 for testing, feedback, etc ... thanks...

@balthisar
Copy link
Member

Looking at patch, I have a couple of issues.

Mostly, for exit status to be portable, it can only range from 0 to 255. This is easily fixed, however.

More importantly, if we're treating a bad config option as a failure, why are we still generating output? Shouldn't this be dependent on force-output? It seems counterintuitive to call this a fatal error but allow document generation, while errors prevent document generation without force-output.

@balthisar balthisar added this to the 5.9 milestone Jul 9, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants