Skip to content

Latest commit

 

History

History
1391 lines (991 loc) · 102 KB

12_an_i_o_project_build_a_command_line_program.md

File metadata and controls

1391 lines (991 loc) · 102 KB

12 构建一个命令行项目(An I/O Project: Building a Command Line Program)

This chapter is a recap of the many skills you’ve learned so far and an exploration of a few more standard library features. We’ll build a command line tool that interacts with file and command line input/output to practice some of the Rust concepts you now have under your belt.

Rust’s speed, safety, single binary output, and cross-platform support make it an ideal language for creating command line tools, so for our project, we’ll make our own version of the classic command line search tool grep (globally search a regular expression and print). In the simplest use case, grep searches a specified file for a specified string. To do so, grep takes as its arguments a file path and a string. Then it reads the file, finds lines in that file that contain the string argument, and prints those lines.

Along the way, we’ll show how to make our command line tool use the terminal features that many other command line tools use. We’ll read the value of an environment variable to allow the user to configure the behavior of our tool. We’ll also print error messages to the standard error console stream (stderr) instead of standard output (stdout), so, for example, the user can redirect successful output to a file while still seeing error messages onscreen.

One Rust community member, Andrew Gallant, has already created a fully featured, very fast version of grep, called ripgrep. By comparison, our version will be fairly simple, but this chapter will give you some of the background knowledge you need to understand a real-world project such as ripgrep.

Our grep project will combine a number of concepts you’ve learned so far:

  • Organizing code (using what you learned about modules in Chapter 7)
  • Using vectors and strings (collections, Chapter 8)
  • Handling errors (Chapter 9)
  • Using traits and lifetimes where appropriate (Chapter 10)
  • Writing tests (Chapter 11)

We’ll also briefly introduce closures, iterators, and trait objects, which Chapters 13 and 17 will cover in detail.

本章回顾了您迄今为止学到的许多技能,并探索了更多标准库功能。 我们将构建一个与文件和命令行输入/输出交互的命令行工具,以练习您现在掌握的一些 Rust 概念。

Rust 的速度、安全性、单一二进制输出和跨平台支持使其成为创建命令行工具的理想语言,因此对于我们的项目,我们将制作自己版本的经典命令行搜索工具 grep(全局搜索常规命令行工具) 表达和打印)。 在最简单的用例中,grep 在指定文件中搜索指定字符串。 为此,grep 将文件路径和字符串作为其参数。 然后它读取该文件,查找该文件中包含字符串参数的行,并打印这些行。

在此过程中,我们将展示如何使我们的命令行工具使用许多其他命令行工具使用的终端功能。 我们将读取环境变量的值,以允许用户配置我们工具的行为。 我们还将错误消息打印到标准错误控制台流 (stderr),而不是标准输出 (stdout),因此,例如,用户可以将成功的输出重定向到文件,同时仍然在屏幕上看到错误消息。

Rust 社区成员 Andrew Gallant 已经创建了一个功能齐全、速度非常快的 grep 版本,称为 ripgrep。 相比之下,我们的版本相当简单,但本章将为您提供一些理解真实项目(例如 ripgrep)所需的背景知识。

我们的 grep 项目将结合您迄今为止学到的许多概念:

  • 组织代码(使用您在第 7 章中学到的有关模块的知识)
  • 使用向量和字符串(集合,第 8 章)
  • 处理错误(第 9 章)
  • 在适当的情况下使用特征和生命周期(第 10 章)
  • 编写测试(第 11 章)

我们还将简要介绍闭包、迭代器和特征对象,第 13 章和第 17 章将详细介绍这些内容。

12.1 接收命令行参数(Acdepting Command Line Arguments)

Let’s create a new project with, as always, cargo new. We’ll call our project minigrep to distinguish it from the grep tool that you might already have on your system.

让我们像往常一样使用 Cargo new 创建一个新项目。 我们将我们的项目称为 minigrep,以将其与您系统上可能已有的 grep 工具区分开来。

$ cargo new minigrep
     Created binary (application) `minigrep` project
$ cd minigrep

The first task is to make minigrep accept its two command line arguments: the file path and a string to search for. That is, we want to be able to run our program with cargo run, two hyphens to indicate the following arguments are for our program rather than for cargo, a string to search for, and a path to a file to search in, like so:

第一个任务是让 minigrep 接受它的两个命令行参数:文件路径和要搜索的字符串。 也就是说,我们希望能够使用cargo run来运行我们的程序,两个连字符表示以下参数是我们的程序而不是cargo,一个要搜索的字符串,以及一个要搜索的文件的路径,就像这样 :

$ cargo run -- searchstring example-filename.txt

Right now, the program generated by cargo new cannot process arguments we give it. Some existing libraries on crates.io can help with writing a program that accepts command line arguments, but because you’re just learning this concept, let’s implement this capability ourselves.

目前,由 Cargo new 生成的程序无法处理我们给它的参数。 crates.io 上的一些现有库可以帮助编写接受命令行参数的程序,但因为您刚刚学习这个概念,所以让我们自己实现此功能。

Reading the Argument Values

To enable minigrep to read the values of command line arguments we pass to it, we’ll need the std::env::args function provided in Rust’s standard library. This function returns an iterator of the command line arguments passed to minigrep. We’ll cover iterators fully in Chapter 13. For now, you only need to know two details about iterators: iterators produce a series of values, and we can call the collect method on an iterator to turn it into a collection, such as a vector, that contains all the elements the iterator produces.

The code in Listing 12-1 allows your minigrep program to read any command line arguments passed to it and then collect the values into a vector.

为了使 minigrep 能够读取我们传递给它的命令行参数的值,我们需要 Rust 标准库中提供的 std::env::args 函数。 该函数返回传递给 minigrep 的命令行参数的迭代器。 我们将在第 13 章中全面介绍迭代器。现在,您只需要了解有关迭代器的两个细节:迭代器产生一系列值,我们可以调用迭代器上的collect方法将其转换为集合,例如 向量,包含迭代器生成的所有元素。

清单 12-1 中的代码允许您的 minigrep 程序读取传递给它的任何命令行参数,然后将这些值收集到一个向量中。

Filename: src/main.rs

use std::env;

fn main() {
    let args: Vec<String> = env::args().collect();
    dbg!(args);
}

Listing 12-1: Collecting the command line arguments into a vector and printing them

First, we bring the std::env module into scope with a use statement so we can use its args function. Notice that the std::env::args function is nested in two levels of modules. As we discussed in Chapter 7, in cases where the desired function is nested in more than one module, we’ve chosen to bring the parent module into scope rather than the function. By doing so, we can easily use other functions from std::env. It’s also less ambiguous than adding use std::env::args and then calling the function with just args, because args might easily be mistaken for a function that’s defined in the current module.

首先,我们使用 use 语句将 std::env 模块引入作用域,以便我们可以使用它的 args 函数。 请注意,std::env::args 函数嵌套在两层模块中。 正如我们在第 7 章中讨论的,如果所需的函数嵌套在多个模块中,我们选择将父模块而不是函数纳入作用域。 通过这样做,我们可以轻松地使用 std::env 中的其他函数。 它也比添加 use std::env::args 然后仅使用 args 调用函数更明确,因为 args 可能很容易被误认为是当前模块中定义的函数。

### The args Function and Invalid Unicode
Note that std::env::args will panic if any argument contains invalid Unicode. If your program needs to accept arguments containing invalid Unicode, use std::env::args_os instead. That function returns an iterator that produces OsString values instead of String values. We’ve chosen to use std::env::args here for simplicity, because OsString values differ per platform and are more complex to work with than String values.

On the first line of main, we call env::args, and we immediately use collect to turn the iterator into a vector containing all the values produced by the iterator. We can use the collect function to create many kinds of collections, so we explicitly annotate the type of args to specify that we want a vector of strings. Although we very rarely need to annotate types in Rust, collect is one function you do often need to annotate because Rust isn’t able to infer the kind of collection you want.

在main的第一行,我们调用env::args,然后立即使用collect将迭代器转换为包含迭代器产生的所有值的向量。 我们可以使用collect函数创建多种集合,因此我们显式注释args的类型以指定我们想要一个字符串向量。 尽管我们很少需要在 Rust 中注释类型,但collect是您经常需要注释的函数之一,因为Rust无法推断您想要的集合类型。

Finally, we print the vector using the debug macro. Let’s try running the code first with no arguments and then with two arguments:

$ cargo run
   Compiling minigrep v0.1.0 (file:///projects/minigrep)
    Finished dev [unoptimized + debuginfo] target(s) in 0.61s
     Running `target/debug/minigrep`
[src/main.rs:5] args = [
    "target/debug/minigrep",
]
$ cargo run -- needle haystack
   Compiling minigrep v0.1.0 (file:///projects/minigrep)
    Finished dev [unoptimized + debuginfo] target(s) in 1.57s
     Running `target/debug/minigrep needle haystack`
[src/main.rs:5] args = [
    "target/debug/minigrep",
    "needle",
    "haystack",
]

Notice that the first value in the vector is "target/debug/minigrep", which is the name of our binary. This matches the behavior of the arguments list in C, letting programs use the name by which they were invoked in their execution. It’s often convenient to have access to the program name in case you want to print it in messages or change behavior of the program based on what command line alias was used to invoke the program. But for the purposes of this chapter, we’ll ignore it and save only the two arguments we need.

请注意,向量中的第一个值是“target/debug/minigrep”,这是我们的二进制文件的名称。 这与 C 中参数列表的行为相匹配,让程序使用在执行中调用它们的名称。 如果您想在消息中打印程序名称或根据用于调用程序的命令行别名更改程序的行为,访问程序名称通常很方便。 但出于本章的目的,我们将忽略它并仅保存我们需要的两个参数。

Saving the Argument Values in Variables

The program is currently able to access the values specified as command line arguments. Now we need to save the values of the two arguments in variables so we can use the values throughout the rest of the program. We do that in Listing 12-2.

该程序当前能够访问指定为命令行参数的值。 现在我们需要将两个参数的值保存在变量中,以便我们可以在程序的其余部分使用这些值。 我们在清单 12-2 中做到了这一点。

Filename: src/main.rs

use std::env;

fn main() {
    let args: Vec<String> = env::args().collect();

    let query = &args[1];
    let file_path = &args[2];

    println!("Searching for {}", query);
    println!("In file {}", file_path);
}

Listing 12-2: Creating variables to hold the query argument and file path argument

As we saw when we printed the vector, the program’s name takes up the first value in the vector at args[0], so we’re starting arguments at index 1. The first argument minigrep takes is the string we’re searching for, so we put a reference to the first argument in the variable query. The second argument will be the file path, so we put a reference to the second argument in the variable file_path.

We temporarily print the values of these variables to prove that the code is working as we intend. Let’s run this program again with the arguments test and sample.txt:

正如我们在打印向量时所看到的,程序的名称占据了向量中 args[0] 处的第一个值,因此我们从索引 1 处开始参数。minigrep 的第一个参数是我们正在搜索的字符串, 所以我们在变量查询中放置对第一个参数的引用。 第二个参数将是文件路径,因此我们将对第二个参数的引用放入变量 file_path 中。

我们暂时打印这些变量的值,以证明代码按我们的预期运行。 让我们使用参数 test 和 example.txt 再次运行该程序:

$ cargo run -- test sample.txt
   Compiling minigrep v0.1.0 (file:///projects/minigrep)
    Finished dev [unoptimized + debuginfo] target(s) in 0.0s
     Running `target/debug/minigrep test sample.txt`
Searching for test
In file sample.txt

Great, the program is working! The values of the arguments we need are being saved into the right variables. Later we’ll add some error handling to deal with certain potential erroneous situations, such as when the user provides no arguments; for now, we’ll ignore that situation and work on adding file-reading capabilities instead.

太好了,程序正在运行! 我们需要的参数值被保存到正确的变量中。 稍后我们将添加一些错误处理来处理某些潜在的错误情况,例如当用户没有提供参数时; 现在,我们将忽略这种情况并致力于添加文件读取功能。

12.2 Reading a File

Now we’ll add functionality to read the file specified in the file_path argument. First, we need a sample file to test it with: we’ll use a file with a small amount of text over multiple lines with some repeated words. Listing 12-3 has an Emily Dickinson poem that will work well! Create a file called poem.txt at the root level of your project, and enter the poem “I’m Nobody! Who are you?”

现在我们将添加功能来读取 file_path 参数中指定的文件。 首先,我们需要一个示例文件来测试它:我们将使用一个包含多行少量文本和一些重复单词的文件。 清单 12-3 有一首 Emily Dickinson 的诗,效果很好! 在项目的根级别创建一个名为诗歌.txt 的文件,并输入诗歌“我是无名小卒! 你是谁?”

Filename: poem.txt

I'm nobody! Who are you?
Are you nobody, too?
Then there's a pair of us - don't tell!
They'd banish us, you know.

How dreary to be somebody!
How public, like a frog
To tell your name the livelong day
To an admiring bog!

Listing 12-3: A poem by Emily Dickinson makes a good test case

With the text in place, edit src/main.rs and add code to read the file, as shown in Listing 12-4.

Filename: src/main.rs

use std::env;
use std::fs;

fn main() {
    // --snip--
    println!("In file {}", file_path);

    let contents = fs::read_to_string(file_path)
        .expect("Should have been able to read the file");

    println!("With text:\n{contents}");
}

Listing 12-4: Reading the contents of the file specified by the second argument

First, we bring in a relevant part of the standard library with a use statement: we need std::fs to handle files.

In main, the new statement fs::read_to_string takes the file_path, opens that file, and returns a std::io::Result of the file’s contents.

After that, we again add a temporary println! statement that prints the value of contents after the file is read, so we can check that the program is working so far.

Let’s run this code with any string as the first command line argument (because we haven’t implemented the searching part yet) and the poem.txt file as the second argument:

首先,我们通过 use 语句引入标准库的相关部分:我们需要 std::fs 来处理文件。

在 main 中,新语句 fs::read_to_string 获取 file_path,打开该文件,并返回文件内容的 std::io::Result。

之后,我们再次添加一个临时 println! 语句在读取文件后打印内容的值,这样我们就可以检查程序到目前为止是否正常工作。

让我们使用任意字符串作为第一个命令行参数(因为我们还没有实现搜索部分)并使用诗歌.txt 文件作为第二个参数来运行此代码:

$ cargo run -- the poem.txt
   Compiling minigrep v0.1.0 (file:///projects/minigrep)
    Finished dev [unoptimized + debuginfo] target(s) in 0.0s
     Running `target/debug/minigrep the poem.txt`
Searching for the
In file poem.txt
With text:
I'm nobody! Who are you?
Are you nobody, too?
Then there's a pair of us - don't tell!
They'd banish us, you know.

How dreary to be somebody!
How public, like a frog
To tell your name the livelong day
To an admiring bog!

Great! The code read and then printed the contents of the file. But the code has a few flaws. At the moment, the main function has multiple responsibilities: generally, functions are clearer and easier to maintain if each function is responsible for only one idea. The other problem is that we’re not handling errors as well as we could. The program is still small, so these flaws aren’t a big problem, but as the program grows, it will be harder to fix them cleanly. It’s good practice to begin refactoring early on when developing a program, because it’s much easier to refactor smaller amounts of code. We’ll do that next.

伟大的! 该代码读取并打印文件的内容。 但该代码有一些缺陷。 目前,主函数具有多重职责:一般来说,如果每个函数只负责一个想法,那么函数会更清晰,更容易维护。 另一个问题是我们没有尽力处理错误。 程序还很小,所以这些缺陷并不是什么大问题,但随着程序的增长,彻底修复它们就会变得更加困难。 在开发程序时尽早开始重构是一种很好的做法,因为重构少量代码会更容易。 我们接下来会这样做。

12.3 Refactoring to improve Modularity and Error Handling

To improve our program, we’ll fix four problems that have to do with the program’s structure and how it’s handling potential errors. First, our main function now performs two tasks: it parses arguments and reads files. As our program grows, the number of separate tasks the main function handles will increase. As a function gains responsibilities, it becomes more difficult to reason about, harder to test, and harder to change without breaking one of its parts. It’s best to separate functionality so each function is responsible for one task.

This issue also ties into the second problem: although query and file_path are configuration variables to our program, variables like contents are used to perform the program’s logic. The longer main becomes, the more variables we’ll need to bring into scope; the more variables we have in scope, the harder it will be to keep track of the purpose of each. It’s best to group the configuration variables into one structure to make their purpose clear.

The third problem is that we’ve used expect to print an error message when reading the file fails, but the error message just prints Should have been able to read the file. Reading a file can fail in a number of ways: for example, the file could be missing, or we might not have permission to open it. Right now, regardless of the situation, we’d print the same error message for everything, which wouldn’t give the user any information!

Fourth, we use expect repeatedly to handle different errors, and if the user runs our program without specifying enough arguments, they’ll get an index out of bounds error from Rust that doesn’t clearly explain the problem. It would be best if all the error-handling code were in one place so future maintainers had only one place to consult the code if the error-handling logic needed to change. Having all the error-handling code in one place will also ensure that we’re printing messages that will be meaningful to our end users.

Let’s address these four problems by refactoring our project.

为了改进我们的程序,我们将解决与程序结构及其处理潜在错误的方式有关的四个问题。 首先,我们的主函数现在执行两项任务:解析参数和读取文件。 随着程序的增长,主函数处理的单独任务的数量将会增加。 当一个函数获得了职责时,它就会变得更难推理、更难测试、更难在不破坏其任何部分的情况下进行更改。 最好将功能分开,以便每个功能负责一项任务。

这个问题也与第二个问题相关:虽然 query 和 file_path 是我们程序的配置变量,但像内容这样的变量用于执行程序的逻辑。 main 变得越长,我们需要引入的变量就越多; 范围内的变量越多,跟踪每个变量的目的就越困难。 最好将配置变量分组到一个结构中,以明确其用途。

第三个问题是,我们曾经使用expect在读取文件失败时打印一条错误消息,但错误消息只是打印Should have been able to read the file。 读取文件可能会因多种原因而失败:例如,文件可能丢失,或者我们可能没有打开它的权限。 现在,无论情况如何,我们都会为所有内容打印相同的错误消息,这不会向用户提供任何信息!

第四,我们反复使用expect来处理不同的错误,如果用户在没有指定足够参数的情况下运行我们的程序,他们会从Rust中得到一个索引越界错误,而这个错误并不能清楚地解释问题。 最好将所有错误处理代码都放在一处,以便未来的维护人员在错误处理逻辑需要更改时只有一处可以查阅代码。 将所有错误处理代码放在一处还可以确保我们打印对最终用户有意义的消息。

让我们通过重构我们的项目来解决这四个问题。

Separation of Concerns for Binary Projects

The organizational problem of allocating responsibility for multiple tasks to the main function is common to many binary projects. As a result, the Rust community has developed guidelines for splitting the separate concerns of a binary program when main starts getting large. This process has the following steps:

  • Split your program into a main.rs and a lib.rs and move your program’s logic to lib.rs.

  • As long as your command line parsing logic is small, it can remain in main.rs.

  • When the command line parsing logic starts getting complicated, extract it from main.rs and move it to lib.rs. The responsibilities that remain in the main function after this process should be limited to the following:

  • Calling the command line parsing logic with the argument values

  • Setting up any other configuration

  • Calling a run function in lib.rs

  • Handling the error if run returns an error

This pattern is about separating concerns: main.rs handles running the program, and lib.rs handles all the logic of the task at hand. Because you can’t test the main function directly, this structure lets you test all of your program’s logic by moving it into functions in lib.rs. The code that remains in main.rs will be small enough to verify its correctness by reading it. Let’s rework our program by following this process.

将多个任务的责任分配给主函数的组织问题是许多二进制项目的常见问题。 因此,Rust 社区制定了当 main 开始变大时分割二进制程序的单独关注点的指南。 该过程有以下步骤:

  • 将程序拆分为 main.rs 和 lib.rs,并将程序逻辑移至 lib.rs。

  • 只要你的命令行解析逻辑很小,它就可以保留在main.rs中。

  • 当命令行解析逻辑开始变得复杂时,将其从 main.rs 中提取出来并移至 lib.rs。 此过程之后保留在主函数中的职责应限于以下内容:

  • 使用参数值调用命令行解析逻辑

  • 设置任何其他配置

  • 调用 lib.rs 中的 run 函数

  • 如果运行返回错误,则处理错误

这种模式是关于分离关注点:main.rs 处理运行程序,lib.rs 处理手头任务的所有逻辑。 因为您无法直接测试 main 函数,所以此结构允许您通过将程序的所有逻辑移动到 lib.rs 中的函数中来测试它。 main.rs 中保留的代码足够小,可以通过读取来验证其正确性。 让我们按照这个过程重新设计我们的程序。

Extracting the Argument Parser

We’ll extract the functionality for parsing arguments into a function that main will call to prepare for moving the command line parsing logic to src/lib.rs. Listing 12-5 shows the new start of main that calls a new function parse_config, which we’ll define in src/main.rs for the moment.

我们将把解析参数的功能提取到一个函数中,main 将调用该函数来准备将命令行解析逻辑移动到 src/lib.rs。 清单 12-5 显示了 main 的新开始,它调用了一个新函数 parse_config,我们暂时在 src/main.rs 中定义它。

Filename: src/main.rs

fn main() {
    let args: Vec<String> = env::args().collect();

    let (query, file_path) = parse_config(&args);

    // --snip--
}

fn parse_config(args: &[String]) -> (&str, &str) {
    let query = &args[1];
    let file_path = &args[2];

    (query, file_path)
}

Listing 12-5: Extracting a parse_config function from main

We’re still collecting the command line arguments into a vector, but instead of assigning the argument value at index 1 to the variable query and the argument value at index 2 to the variable file_path within the main function, we pass the whole vector to the parse_config function. The parse_config function then holds the logic that determines which argument goes in which variable and passes the values back to main. We still create the query and file_path variables in main, but main no longer has the responsibility of determining how the command line arguments and variables correspond.

This rework may seem like overkill for our small program, but we’re refactoring in small, incremental steps. After making this change, run the program again to verify that the argument parsing still works. It’s good to check your progress often, to help identify the cause of problems when they occur.

我们仍然将命令行参数收集到一个向量中,但我们不是将索引 1 处的参数值分配给变量查询,将索引 2 处的参数值分配给主函数中的变量 file_path,而是将整个向量传递给 parse_config 函数。 然后,parse_config 函数保存确定哪个参数放入哪个变量并将值传递回 main 的逻辑。 我们仍然在 main 中创建 query 和 file_path 变量,但 main 不再负责确定命令行参数和变量如何对应。

对于我们的小程序来说,这种返工似乎有点矫枉过正,但我们正在以小的、渐进的步骤进行重构。 进行此更改后,再次运行程序以验证参数解析是否仍然有效。 经常检查您的进度是很好的做法,有助于在问题发生时确定问题的原因。

Grouping Configuration Values

We can take another small step to improve the parse_config function further. At the moment, we’re returning a tuple, but then we immediately break that tuple into individual parts again. This is a sign that perhaps we don’t have the right abstraction yet.

Another indicator that shows there’s room for improvement is the config part of parse_config, which implies that the two values we return are related and are both part of one configuration value. We’re not currently conveying this meaning in the structure of the data other than by grouping the two values into a tuple; we’ll instead put the two values into one struct and give each of the struct fields a meaningful name. Doing so will make it easier for future maintainers of this code to understand how the different values relate to each other and what their purpose is.

Listing 12-6 shows the improvements to the parse_config function.

我们可以再采取一小步来进一步改进 parse_config 函数。 目前,我们正在返回一个元组,但随后我们立即再次将该元组分解为各个部分。 这表明我们可能还没有正确的抽象。

显示还有改进空间的另一个指标是 parse_config 的配置部分,这意味着我们返回的两个值是相关的,并且都是一个配置值的一部分。 目前,除了将两个值分组到一个元组中之外,我们还没有在数据结构中传达这种含义; 相反,我们会将这两个值放入一个结构体中,并为每个结构体字段指定一个有意义的名称。 这样做将使该代码的未来维护者更容易理解不同值如何相互关联以及它们的目的是什么。

清单 12-6 显示了对 parse_config 函数的改进。

Filename: src/main.rs

fn main() {
    let args: Vec<String> = env::args().collect();

    let config = parse_config(&args);

    println!("Searching for {}", config.query);
    println!("In file {}", config.file_path);

    let contents = fs::read_to_string(config.file_path)
        .expect("Should have been able to read the file");

    // --snip--
}

struct Config {
    query: String,
    file_path: String,
}

fn parse_config(args: &[String]) -> Config {
    let query = args[1].clone();
    let file_path = args[2].clone();

    Config { query, file_path }
}

Listing 12-6: Refactoring parse_config to return an instance of a Config struct

We’ve added a struct named Config defined to have fields named query and file_path. The signature of parse_config now indicates that it returns a Config value. In the body of parse_config, where we used to return string slices that reference String values in args, we now define Config to contain owned String values. The args variable in main is the owner of the argument values and is only letting the parse_config function borrow them, which means we’d violate Rust’s borrowing rules if Config tried to take ownership of the values in args.

There are a number of ways we could manage the String data; the easiest, though somewhat inefficient, route is to call the clone method on the values. This will make a full copy of the data for the Config instance to own, which takes more time and memory than storing a reference to the string data. However, cloning the data also makes our code very straightforward because we don’t have to manage the lifetimes of the references; in this circumstance, giving up a little performance to gain simplicity is a worthwhile trade-off.

我们添加了一个名为 Config 的结构,定义了名为 query 和 file_path 的字段。 parse_config 的签名现在表明它返回一个 Config 值。 在 parse_config 的主体中,我们过去常常返回引用 args 中的 String 值的字符串切片,现在我们定义 Config 来包含拥有的 String 值。 main 中的 args 变量是参数值的所有者,并且只允许 parse_config 函数借用它们,这意味着如果 Config 尝试获取 args 中值的所有权,我们就会违反 Rust 的借用规则。

我们可以通过多种方式管理字符串数据; 最简单但效率有些低的方法是对值调用克隆方法。 这将为 Config 实例创建数据的完整副本以供其拥有,这比存储对字符串数据的引用需要更多的时间和内存。 然而,克隆数据也使我们的代码变得非常简单,因为我们不必管理引用的生命周期; 在这种情况下,放弃一点性能以获得简单性是值得的权衡。

The Trade-Offs of Using clone
There’s a tendency among many Rustaceans to avoid using clone to fix ownership problems because of its runtime cost. In Chapter 13, you’ll learn how to use more efficient methods in this type of situation. But for now, it’s okay to copy a few strings to continue making progress because you’ll make these copies only once and your file path and query string are very small. It’s better to have a working program that’s a bit inefficient than to try to hyperoptimize code on your first pass. As you become more experienced with Rust, it’ll be easier to start with the most efficient solution, but for now, it’s perfectly acceptable to call clone.

We’ve updated main so it places the instance of Config returned by parse_config into a variable named config, and we updated the code that previously used the separate query and file_path variables so it now uses the fields on the Config struct instead.

Now our code more clearly conveys that query and file_path are related and that their purpose is to configure how the program will work. Any code that uses these values knows to find them in the config instance in the fields named for their purpose.

我们更新了 main,以便它将 parse_config 返回的 Config 实例放入名为 config 的变量中,并且我们更新了之前使用单独的 query 和 file_path 变量的代码,因此它现在使用 Config 结构体上的字段。

现在我们的代码更清楚地传达了 query 和 file_path 是相关的,并且它们的目的是配置程序的工作方式。 任何使用这些值的代码都知道在配置实例中为其目的命名的字段中找到它们。

Creating a Constructor for Config

So far, we’ve extracted the logic responsible for parsing the command line arguments from main and placed it in the parse_config function. Doing so helped us to see that the query and file_path values were related and that relationship should be conveyed in our code. We then added a Config struct to name the related purpose of query and file_path and to be able to return the values’ names as struct field names from the parse_config function.

So now that the purpose of the parse_config function is to create a Config instance, we can change parse_config from a plain function to a function named new that is associated with the Config struct. Making this change will make the code more idiomatic. We can create instances of types in the standard library, such as String, by calling String::new. Similarly, by changing parse_config into a new function associated with Config, we’ll be able to create instances of Config by calling Config::new. Listing 12-7 shows the changes we need to make.

到目前为止,我们已经从 main 中提取了负责解析命令行参数的逻辑,并将其放置在 parse_config 函数中。 这样做可以帮助我们看到查询和 file_path 值是相关的,并且应该在我们的代码中传达这种关系。 然后,我们添加了一个 Config 结构体来命名查询和 file_path 的相关用途,并能够从 parse_config 函数将值的名称作为结构体字段名称返回。

现在 parse_config 函数的目的是创建一个 Config 实例,我们可以将 parse_config 从普通函数更改为与 Config 结构体关联的名为 new 的函数。 进行此更改将使代码更加惯用。 我们可以通过调用String::new来创建标准库中类型的实例,例如String。 类似地,通过将 parse_config 更改为与 Config 关联的新函数,我们将能够通过调用 Config::new 创建 Config 的实例。 清单 12-7 显示了我们需要进行的更改。

Filename: src/main.rs

fn main() {
    let args: Vec<String> = env::args().collect();

    let config = Config::new(&args);

    // --snip--
}

// --snip--

impl Config {
    fn new(args: &[String]) -> Config {
        let query = args[1].clone();
        let file_path = args[2].clone();

        Config { query, file_path }
    }
}

Listing 12-7: Changing parse_config into Config::new

We’ve updated main where we were calling parse_config to instead call Config::new. We’ve changed the name of parse_config to new and moved it within an impl block, which associates the new function with Config. Try compiling this code again to make sure it works.

我们更新了 main 中调用 parse_config 的位置,改为调用 Config::new。 我们已将 parse_config 的名称更改为 new 并将其移至 impl 块中,该块将新函数与 Config 相关联。 尝试再次编译此代码以确保其有效。

Fixing the Error Handling

Now we’ll work on fixing our error handling. Recall that attempting to access the values in the args vector at index 1 or index 2 will cause the program to panic if the vector contains fewer than three items. Try running the program without any arguments; it will look like this:

现在我们将致力于修复错误处理。 回想一下,如果向量包含的项少于三个,尝试访问 args 向量中索引 1 或索引 2 处的值将导致程序出现恐慌。 尝试在不带任何参数的情况下运行该程序; 它看起来像这样:

$ cargo run
   Compiling minigrep v0.1.0 (file:///projects/minigrep)
    Finished dev [unoptimized + debuginfo] target(s) in 0.0s
     Running `target/debug/minigrep`
thread 'main' panicked at 'index out of bounds: the len is 1 but the index is 1', src/main.rs:27:21
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

The line index out of bounds: the len is 1 but the index is 1 is an error message intended for programmers. It won’t help our end users understand what they should do instead. Let’s fix that now.

行索引越界:len is 1 but the index is 1 是针对程序员的错误消息。 它不会帮助我们的最终用户了解他们应该做什么。 现在让我们解决这个问题。

Improving the Error Message

In Listing 12-8, we add a check in the new function that will verify that the slice is long enough before accessing index 1 and 2. If the slice isn’t long enough, the program panics and displays a better error message.

在清单 12-8 中,我们在新函数中添加了一个检查,该检查将在访问索引 1 和 2 之前验证切片是否足够长。如果切片不够长,程序会出现混乱并显示更好的错误消息。

Filename: src/main.rs

    // --snip--
    fn new(args: &[String]) -> Config {
        if args.len() < 3 {
            panic!("not enough arguments");
        }
        // --snip--

Listing 12-8: Adding a check for the number of arguments

This code is similar to the Guess::new function we wrote in Listing 9-13, where we called panic! when the value argument was out of the range of valid values. Instead of checking for a range of values here, we’re checking that the length of args is at least 3 and the rest of the function can operate under the assumption that this condition has been met. If args has fewer than three items, this condition will be true, and we call the panic! macro to end the program immediately.

With these extra few lines of code in new, let’s run the program without any arguments again to see what the error looks like now:

这段代码类似于我们在清单 9-13 中编写的 Guess::new 函数,我们在其中调用了panic! 当值参数超出有效值范围时。 我们不是在这里检查值的范围,而是检查 args 的长度是否至少为 3,并且函数的其余部分可以在满足此条件的假设下运行。 如果 args 少于三个项目,则此条件将为真,我们称之为恐慌! 立即结束程序的宏。

通过 new 中的这些额外的几行代码,让我们再次运行不带任何参数的程序,看看错误现在是什么样子:

$ cargo run
   Compiling minigrep v0.1.0 (file:///projects/minigrep)
    Finished dev [unoptimized + debuginfo] target(s) in 0.0s
     Running `target/debug/minigrep`
thread 'main' panicked at 'not enough arguments', src/main.rs:26:13
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

This output is better: we now have a reasonable error message. However, we also have extraneous information we don’t want to give to our users. Perhaps using the technique we used in Listing 9-13 isn’t the best to use here: a call to panic! is more appropriate for a programming problem than a usage problem, as discussed in Chapter 9. Instead, we’ll use the other technique you learned about in Chapter 9—returning a Result that indicates either success or an error.

这个输出更好了:我们现在有一个合理的错误消息。 然而,我们也有一些不想提供给用户的无关信息。 也许我们在清单 9-13 中使用的技术并不是最好的选择:调用恐慌! 比第 9 章中讨论的使用问题更适合编程问题。相反,我们将使用您在第 9 章中学到的其他技术 - 返回指示成功或错误的结果。

Returning a Result Instead of Calling panic!

We can instead return a Result value that will contain a Config instance in the successful case and will describe the problem in the error case. We’re also going to change the function name from new to build because many programmers expect new functions to never fail. When Config::build is communicating to main, we can use the Result type to signal there was a problem. Then we can change main to convert an Err variant into a more practical error for our users without the surrounding text about thread 'main' and RUST_BACKTRACE that a call to panic! causes.

Listing 12-9 shows the changes we need to make to the return value of the function we’re now calling Config::build and the body of the function needed to return a Result. Note that this won’t compile until we update main as well, which we’ll do in the next listing.

相反,我们可以返回一个 Result 值,该值在成功情况下包含 Config 实例,在错误情况下描述问题。 我们还将函数名称从 new 更改为 build,因为许多程序员希望新函数永远不会失败。 当 Config::build 与 main 通信时,我们可以使用 Result 类型来表示存在问题。 然后我们可以更改 main 将 Err 变体转换为对我们的用户来说更实际的错误,而无需围绕线程“main”和 RUST_BACKTRACE 的文本来调用恐慌! 原因。

清单 12-9 显示了我们需要对我们现在调用 Config::build 的函数的返回值以及返回 Result 所需的函数体进行的更改。 请注意,在我们更新 main 之前,这不会编译,我们将在下一个清单中执行此操作。

Filename: src/main.rs

This code does not compile!
impl Config {
    fn build(args: &[String]) -> Result<Config, &'static str> {
        if args.len() < 3 {
            return Err("not enough arguments");
        }

        let query = args[1].clone();
        let file_path = args[2].clone();

        Ok(Config { query, file_path })
    }
}

Listing 12-9: Returning a Result from Config::build

Our build function returns a Result with a Config instance in the success case and a &'static str in the error case. Our error values will always be string literals that have the 'static lifetime.

We’ve made two changes in the body of the function: instead of calling panic! when the user doesn’t pass enough arguments, we now return an Err value, and we’ve wrapped the Config return value in an Ok. These changes make the function conform to its new type signature.

Returning an Err value from Config::build allows the main function to handle the Result value returned from the build function and exit the process more cleanly in the error case.

我们的构建函数在成功情况下返回一个 Config 实例,在错误情况下返回一个 &'static str 。 我们的错误值将始终是具有“静态生命周期”的字符串文字。

我们对函数体做了两处更改:而不是调用panic! 当用户没有传递足够的参数时,我们现在返回一个 Err 值,并且我们将 Config 返回值包装在 Ok 中。 这些更改使函数符合其新的类型签名。

从 Config::build 返回 Err 值允许 main 函数处理从 build 函数返回的 Result 值,并在错误情况下更干净地退出进程。

Calling Config::build and Handling Errors

To handle the error case and print a user-friendly message, we need to update main to handle the Result being returned by Config::build, as shown in Listing 12-10. We’ll also take the responsibility of exiting the command line tool with a nonzero error code away from panic! and instead implement it by hand. A nonzero exit status is a convention to signal to the process that called our program that the program exited with an error state.

为了处理错误情况并打印用户友好的消息,我们需要更新 main 以处理 Config::build 返回的结果,如清单 12-10 所示。 我们还将负责以非零错误代码退出命令行工具,以避免恐慌! 而是手动实现它。 非零退出状态是一种约定,用于向调用我们程序的进程发出信号,表明该程序以错误状态退出。

Filename: src/main.rs

use std::process;

fn main() {
    let args: Vec<String> = env::args().collect();

    let config = Config::build(&args).unwrap_or_else(|err| {
        println!("Problem parsing arguments: {err}");
        process::exit(1);
    });

    // --snip--

Listing 12-10: Exiting with an error code if building a Config fails

In this listing, we’ve used a method we haven’t covered in detail yet: unwrap_or_else, which is defined on Result<T, E> by the standard library. Using unwrap_or_else allows us to define some custom, non-panic! error handling. If the Result is an Ok value, this method’s behavior is similar to unwrap: it returns the inner value Ok is wrapping. However, if the value is an Err value, this method calls the code in the closure, which is an anonymous function we define and pass as an argument to unwrap_or_else. We’ll cover closures in more detail in Chapter 13. For now, you just need to know that unwrap_or_else will pass the inner value of the Err, which in this case is the static string "not enough arguments" that we added in Listing 12-9, to our closure in the argument err that appears between the vertical pipes. The code in the closure can then use the err value when it runs.

We’ve added a new use line to bring process from the standard library into scope. The code in the closure that will be run in the error case is only two lines: we print the err value and then call process::exit. The process::exit function will stop the program immediately and return the number that was passed as the exit status code. This is similar to the panic!-based handling we used in Listing 12-8, but we no longer get all the extra output. Let’s try it:

在此清单中,我们使用了一个尚未详细介绍的方法:unwrap_or_else,它是由标准库在 Result<T, E> 上定义的。 使用 unwrap_or_else 允许我们定义一些自定义的、非恐慌的! 错误处理。 如果 Result 是 Ok 值,则此方法的行为类似于 unwrap:它返回 Ok 正在包装的内部值。 但是,如果该值是 Err 值,则此方法会调用闭包中的代码,该闭包是我们定义的匿名函数,并作为参数传递给 unwrap_or_else。 我们将在第 13 章中更详细地介绍闭包。现在,您只需要知道 unwrap_or_else 将传递 Err 的内部值,在本例中是我们在清单 12 中添加的静态字符串“没有足够的参数” -9,到出现在垂直管道之间的参数 err 中的闭包。 然后,闭包中的代码可以在运行时使用 err 值。

我们添加了一个新的 use 行,将标准库中的流程纳入范围。 在错误情况下运行的闭包中的代码只有两行:我们打印 err 值,然后调用 process::exit。 process::exit 函数将立即停止程序并返回作为退出状态代码传递的数字。 这类似于我们在清单 12-8 中使用的基于恐慌的处理,但我们不再获得所有额外的输出。 让我们尝试一下:

$ cargo run
   Compiling minigrep v0.1.0 (file:///projects/minigrep)
    Finished dev [unoptimized + debuginfo] target(s) in 0.48s
     Running `target/debug/minigrep`
Problem parsing arguments: not enough arguments

Great! This output is much friendlier for our users.

Extracting Logic from main

Now that we’ve finished refactoring the configuration parsing, let’s turn to the program’s logic. As we stated in “Separation of Concerns for Binary Projects”, we’ll extract a function named run that will hold all the logic currently in the main function that isn’t involved with setting up configuration or handling errors. When we’re done, main will be concise and easy to verify by inspection, and we’ll be able to write tests for all the other logic.

Listing 12-11 shows the extracted run function. For now, we’re just making the small, incremental improvement of extracting the function. We’re still defining the function in src/main.rs.

现在我们已经完成了配置解析的重构,接下来我们来看看程序的逻辑。 正如我们在“二进制项目的关注点分离”中所述,我们将提取一个名为 run 的函数,该函数将保存主函数中当前不涉及设置配置或处理错误的所有逻辑。 当我们完成后,main 将变得简洁并且易于通过检查进行验证,并且我们将能够为所有其他逻辑编写测试。

清单 12-11 显示了提取的 run 函数。 目前,我们只是对提取函数进行微小的渐进式改进。 我们仍在 src/main.rs 中定义该函数。

Filename: src/main.rs

fn main() {
    // --snip--

    println!("Searching for {}", config.query);
    println!("In file {}", config.file_path);

    run(config);
}

fn run(config: Config) {
    let contents = fs::read_to_string(config.file_path)
        .expect("Should have been able to read the file");

    println!("With text:\n{contents}");
}

// --snip--

Listing 12-11: Extracting a run function containing the rest of the program logic

The run function now contains all the remaining logic from main, starting from reading the file. The run function takes the Config instance as an argument.

run 函数现在包含 main 中从读取文件开始的所有剩余逻辑。 run 函数将 Config 实例作为参数。

Returning Errors from the run Function

With the remaining program logic separated into the run function, we can improve the error handling, as we did with Config::build in Listing 12-9. Instead of allowing the program to panic by calling expect, the run function will return a Result<T, E> when something goes wrong. This will let us further consolidate the logic around handling errors into main in a user-friendly way. Listing 12-12 shows the changes we need to make to the signature and body of run.

将剩余的程序逻辑分离到 run 函数中,我们可以改进错误处理,就像清单 12-9 中的 Config::build 所做的那样。 当出现问题时,run 函数将返回 Result<T, E>,而不是通过调用 Expect 来让程序发生恐慌。 这将使我们以用户友好的方式进一步将处理错误的逻辑整合到 main 中。 清单 12-12 显示了我们需要对 run 的签名和正文进行的更改。

Filename: src/main.rs

use std::error::Error;

// --snip--

fn run(config: Config) -> Result<(), Box<dyn Error>> {
    let contents = fs::read_to_string(config.file_path)?;

    println!("With text:\n{contents}");

    Ok(())
}

Listing 12-12: Changing the run function to return Result

We’ve made three significant changes here. First, we changed the return type of the run function to Result<(), Box>. This function previously returned the unit type, (), and we keep that as the value returned in the Ok case.

For the error type, we used the trait object Box (and we’ve brought std::error::Error into scope with a use statement at the top). We’ll cover trait objects in Chapter 17. For now, just know that Box means the function will return a type that implements the Error trait, but we don’t have to specify what particular type the return value will be. This gives us flexibility to return error values that may be of different types in different error cases. The dyn keyword is short for “dynamic.”

Second, we’ve removed the call to expect in favor of the ? operator, as we talked about in Chapter 9. Rather than panic! on an error, ? will return the error value from the current function for the caller to handle.

Third, the run function now returns an Ok value in the success case. We’ve declared the run function’s success type as () in the signature, which means we need to wrap the unit type value in the Ok value. This Ok(()) syntax might look a bit strange at first, but using () like this is the idiomatic way to indicate that we’re calling run for its side effects only; it doesn’t return a value we need.

When you run this code, it will compile but will display a warning:

我们在这里做了三项重大改变。 首先,我们将 run 函数的返回类型更改为 Result<(), Box>。 该函数之前返回了单位类型 (),我们将其保留为 Ok 情况下的返回值。

对于错误类型,我们使用了特征对象 Box (并且我们通过顶部的 use 语句将 std::error::Error 引入了作用域)。 我们将在第 17 章中介绍特征对象。现在,只需知道 Box 意味着该函数将返回一个实现 Error 特征的类型,但我们不必指定返回值是什么特定类型 。 这使我们能够灵活地返回在不同错误情况下可能具有不同类型的错误值。 dyn 关键字是“dynamic”的缩写。

其次,我们删除了对 Expect 的调用,转而使用 ? 运算符,正如我们在第 9 章中讨论的那样。而不是恐慌! 出现错误时,? 将从当前函数返回错误值以供调用者处理。

第三,run 函数现在在成功情况下返回 Ok 值。 我们在签名中将 run 函数的成功类型声明为 (),这意味着我们需要将单位类型值包装在 Ok 值中。 这个 Ok(()) 语法乍一看可能有点奇怪,但像这样使用 () 是惯用的方式,表明我们只是为了它的副作用而调用 run ; 它不会返回我们需要的值。

当您运行此代码时,它将编译但会显示警告:

$ cargo run the poem.txt
   Compiling minigrep v0.1.0 (file:///projects/minigrep)
warning: unused `Result` that must be used
  --> src/main.rs:19:5
   |
19 |     run(config);
   |     ^^^^^^^^^^^
   |
   = note: this `Result` may be an `Err` variant, which should be handled
   = note: `#[warn(unused_must_use)]` on by default

warning: `minigrep` (bin "minigrep") generated 1 warning
    Finished dev [unoptimized + debuginfo] target(s) in 0.71s
     Running `target/debug/minigrep the poem.txt`
Searching for the
In file poem.txt
With text:
I'm nobody! Who are you?
Are you nobody, too?
Then there's a pair of us - don't tell!
They'd banish us, you know.

How dreary to be somebody!
How public, like a frog
To tell your name the livelong day
To an admiring bog!

Rust tells us that our code ignored the Result value and the Result value might indicate that an error occurred. But we’re not checking to see whether or not there was an error, and the compiler reminds us that we probably meant to have some error-handling code here! Let’s rectify that problem now.

Rust 告诉我们,我们的代码忽略了 Result 值,而 Result 值可能表明发生了错误。 但我们并没有检查是否存在错误,编译器提醒我们,我们可能想在这里有一些错误处理代码! 现在让我们纠正这个问题。

Handling Errors Returned from run in main

We’ll check for errors and handle them using a technique similar to one we used with Config::build in Listing 12-10, but with a slight difference:

我们将检查错误并使用类似于清单 12-10 中 Config::build 所使用的技术来处理错误,但略有不同:

Filename: src/main.rs

fn main() {
    // --snip--

    println!("Searching for {}", config.query);
    println!("In file {}", config.file_path);

    if let Err(e) = run(config) {
        println!("Application error: {e}");
        process::exit(1);
    }
}

We use if let rather than unwrap_or_else to check whether run returns an Err value and call process::exit(1) if it does. The run function doesn’t return a value that we want to unwrap in the same way that Config::build returns the Config instance. Because run returns () in the success case, we only care about detecting an error, so we don’t need unwrap_or_else to return the unwrapped value, which would only be ().

The bodies of the if let and the unwrap_or_else functions are the same in both cases: we print the error and exit.

我们使用 if let 而不是 unwrap_or_else 来检查 run 是否返回 Err 值,如果返回则调用 process::exit(1) 。 run 函数不会返回我们想要以 Config::build 返回 Config 实例的方式解包的值。 因为 run 在成功的情况下返回 (),我们只关心检测错误,所以我们不需要 unwrap_or_else 来返回展开的值,它只会是 ()。

if let 和 unwrap_or_else 函数的主体在这两种情况下是相同的:我们打印错误并退出。

Splitting Code into a Library Crate

Our minigrep project is looking good so far! Now we’ll split the src/main.rs file and put some code into the src/lib.rs file. That way we can test the code and have a src/main.rs file with fewer responsibilities.

Let’s move all the code that isn’t the main function from src/main.rs to src/lib.rs:

  • The run function definition
  • The relevant use statements
  • The definition of Config
  • The Config::build function definition

The contents of src/lib.rs should have the signatures shown in Listing 12-13 (we’ve omitted the bodies of the functions for brevity). Note that this won’t compile until we modify src/main.rs in Listing 12-14.

到目前为止,我们的 minigrep 项目看起来不错! 现在我们将拆分 src/main.rs 文件并将一些代码放入 src/lib.rs 文件中。 这样我们就可以测试代码并拥有一个责任更少的 src/main.rs 文件。

让我们将所有非 main 函数的代码从 src/main.rs 移至 src/lib.rs:

  • 运行函数定义
  • 相关使用说明
  • 配置的定义
  • Config::build 函数定义

src/lib.rs 的内容应该具有清单 12-13 中所示的签名(为简洁起见,我们省略了函数体)。 请注意,在我们修改清单 12-14 中的 src/main.rs 之前,它不会编译。

Filename: src/lib.rs

This code does not compile!
use std::error::Error;
use std::fs;

pub struct Config {
    pub query: String,
    pub file_path: String,
}

impl Config {
    pub fn build(args: &[String]) -> Result<Config, &'static str> {
        // --snip--
    }
}

pub fn run(config: Config) -> Result<(), Box<dyn Error>> {
    // --snip--
}

Listing 12-13: Moving Config and run into src/lib.rs

We’ve made liberal use of the pub keyword: on Config, on its fields and its build method, and on the run function. We now have a library crate that has a public API we can test!

Now we need to bring the code we moved to src/lib.rs into the scope of the binary crate in src/main.rs, as shown in Listing 12-14.

我们自由地使用了 pub 关键字:在 Config、其字段和构建方法以及 run 函数上。 我们现在有了一个库箱,其中有一个可以测试的公共 API!

现在我们需要将移动到 src/lib.rs 的代码放入 src/main.rs 中二进制 crate 的范围,如清单 12-14 所示。

Filename: src/main.rs

use std::env;
use std::process;

use minigrep::Config;

fn main() {
    // --snip--
    if let Err(e) = minigrep::run(config) {
        // --snip--
    }
}

Listing 12-14: Using the minigrep library crate in src/main.rs

We add a use minigrep::Config line to bring the Config type from the library crate into the binary crate’s scope, and we prefix the run function with our crate name. Now all the functionality should be connected and should work. Run the program with cargo run and make sure everything works correctly.

Whew! That was a lot of work, but we’ve set ourselves up for success in the future. Now it’s much easier to handle errors, and we’ve made the code more modular. Almost all of our work will be done in src/lib.rs from here on out.

Let’s take advantage of this newfound modularity by doing something that would have been difficult with the old code but is easy with the new code: we’ll write some tests!

我们添加了一个 use minigrep::Config 行,将 Config 类型从库 crate 引入到二进制 crate 的作用域中,并在 run 函数中添加我们的 crate 名称作为前缀。 现在所有功能都应该已连接并且应该可以工作。 使用cargo run 运行程序并确保一切正常。

哇! 这是一项艰巨的工作,但我们已经为未来的成功做好了准备。 现在处理错误变得更加容易,而且我们使代码更加模块化。 从现在开始,我们几乎所有的工作都将在 src/lib.rs 中完成。

让我们利用这种新发现的模块化,做一些对于旧代码来说很困难但对于新代码来说很容易的事情:我们将编写一些测试!

12.4 Developing the Library's Functionality with Test Drivern Development

Now that we’ve extracted the logic into src/lib.rs and left the argument collecting and error handling in src/main.rs, it’s much easier to write tests for the core functionality of our code. We can call functions directly with various arguments and check return values without having to call our binary from the command line.

In this section, we’ll add the searching logic to the minigrep program using the test-driven development (TDD) process with the following steps:

  • Write a test that fails and run it to make sure it fails for the reason you expect.
  • Write or modify just enough code to make the new test pass.
  • Refactor the code you just added or changed and make sure the tests continue to pass.
  • Repeat from step 1!

Though it’s just one of many ways to write software, TDD can help drive code design. Writing the test before you write the code that makes the test pass helps to maintain high test coverage throughout the process.

We’ll test drive the implementation of the functionality that will actually do the searching for the query string in the file contents and produce a list of lines that match the query. We’ll add this functionality in a function called search.

现在我们已经将逻辑提取到 src/lib.rs 中,并将参数收集和错误处理留在 src/main.rs 中,为代码的核心功能编写测试就容易多了。 我们可以使用各种参数直接调用函数并检查返回值,而无需从命令行调用二进制文件。

在本节中,我们将使用测试驱动开发 (TDD) 流程将搜索逻辑添加到 minigrep 程序中,步骤如下:

  • 编写一个失败的测试并运行它以确保它因您期望的原因而失败。
  • 编写或修改足够的代码以使新测试通过。
  • 重构您刚刚添加或更改的代码并确保测试继续通过。
  • 从步骤 1 开始重复!

尽管 TDD 只是编写软件的众多方法之一,但它可以帮助推动代码设计。 在编写使测试通过的代码之前编写测试有助于在整个过程中保持较高的测试覆盖率。

我们将测试驱动功能的实现,该功能将实际在文件内容中搜索查询字符串并生成与查询匹配的行列表。 我们将在名为搜索的函数中添加此功能。

Writing a Failing Test

Because we don’t need them anymore, let’s remove the println! statements from src/lib.rs and src/main.rs that we used to check the program’s behavior. Then, in src/lib.rs, add a tests module with a test function, as we did in Chapter 11. The test function specifies the behavior we want the search function to have: it will take a query and the text to search, and it will return only the lines from the text that contain the query. Listing 12-15 shows this test, which won’t compile yet.

因为我们不再需要它们,所以让我们删除 println! src/lib.rs 和 src/main.rs 中的语句,我们用来检查程序的行为。 然后,在 src/lib.rs 中,添加一个带有测试函数的测试模块,就像我们在第 11 章中所做的那样。测试函数指定我们希望搜索函数具有的行为:它将接受查询和要搜索的文本, 并且它将仅返回文本中包含查询的行。 清单 12-15 显示了这个测试,它还无法编译。

Filename: src/lib.rs

This code does not compile!
#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn one_result() {
        let query = "duct";
        let contents = "\
Rust:
safe, fast, productive.
Pick three.";

        assert_eq!(vec!["safe, fast, productive."], search(query, contents));
    }
}

Listing 12-15: Creating a failing test for the search function we wish we had

This test searches for the string "duct". The text we’re searching is three lines, only one of which contains "duct" (Note that the backslash after the opening double quote tells Rust not to put a newline character at the beginning of the contents of this string literal). We assert that the value returned from the search function contains only the line we expect.

We aren’t yet able to run this test and watch it fail because the test doesn’t even compile: the search function doesn’t exist yet! In accordance with TDD principles, we’ll add just enough code to get the test to compile and run by adding a definition of the search function that always returns an empty vector, as shown in Listing 12-16. Then the test should compile and fail because an empty vector doesn’t match a vector containing the line "safe, fast, productive."

此测试搜索字符串“duct”。 我们正在搜索的文本有三行,其中只有一行包含“duct”(请注意,左双引号后面的反斜杠告诉 Rust 不要在此字符串文字内容的开头放置换行符)。 我们断言搜索函数返回的值仅包含我们期望的行。

我们还无法运行此测试并观察它失败,因为测试甚至无法编译:搜索功能还不存在! 根据 TDD 原则,我们将添加足够的代码来编译和运行测试,方法是添加始终返回空向量的搜索函数的定义,如清单 12-16 所示。 然后测试应该编译并失败,因为空向量与包含“安全、快速、高效”行的向量不匹配。

Filename: src/lib.rs

pub fn search<'a>(query: &str, contents: &'a str) -> Vec<&'a str> {
    vec![]
}

Listing 12-16: Defining just enough of the search function so our test will compile

Notice that we need to define an explicit lifetime 'a in the signature of search and use that lifetime with the contents argument and the return value. Recall in Chapter 10 that the lifetime parameters specify which argument lifetime is connected to the lifetime of the return value. In this case, we indicate that the returned vector should contain string slices that reference slices of the argument contents (rather than the argument query).

In other words, we tell Rust that the data returned by the search function will live as long as the data passed into the search function in the contents argument. This is important! The data referenced by a slice needs to be valid for the reference to be valid; if the compiler assumes we’re making string slices of query rather than contents, it will do its safety checking incorrectly.

请注意,我们需要在 search 的签名中定义显式生命周期 'a ,并将该生命周期与内容参数和返回值一起使用。 回想一下第 10 章中的生命周期参数指定哪个参数生命周期与返回值的生命周期相关。 在这种情况下,我们指示返回的向量应包含引用参数内容(而不是参数查询)切片的字符串切片。

换句话说,我们告诉 Rust,搜索函数返回的数据将随着内容参数中传递到搜索函数的数据而存在。 这个很重要! 切片引用的数据需要有效,引用才有效; 如果编译器假设我们正在制作查询的字符串切片而不是内容,它将错误地进行安全检查。

If we forget the lifetime annotations and try to compile this function, we’ll get this error:

$ cargo build
   Compiling minigrep v0.1.0 (file:///projects/minigrep)
error[E0106]: missing lifetime specifier
  --> src/lib.rs:28:51
   |
28 | pub fn search(query: &str, contents: &str) -> Vec<&str> {
   |                      ----            ----         ^ expected named lifetime parameter
   |
   = help: this function's return type contains a borrowed value, but the signature does not say whether it is borrowed from `query` or `contents`
help: consider introducing a named lifetime parameter
   |
28 | pub fn search<'a>(query: &'a str, contents: &'a str) -> Vec<&'a str> {
   |              ++++         ++                 ++              ++

For more information about this error, try `rustc --explain E0106`.
error: could not compile `minigrep` due to previous error

Rust can’t possibly know which of the two arguments we need, so we need to tell it explicitly. Because contents is the argument that contains all of our text and we want to return the parts of that text that match, we know contents is the argument that should be connected to the return value using the lifetime syntax.

Other programming languages don’t require you to connect arguments to return values in the signature, but this practice will get easier over time. You might want to compare this example with the “Validating References with Lifetimes” section in Chapter 10.

Rust 不可能知道我们需要两个参数中的哪一个,所以我们需要明确地告诉它。 因为内容是包含所有文本的参数,并且我们希望返回该文本中匹配的部分,所以我们知道内容是应该使用生命周期语法连接到返回值的参数。

其他编程语言不要求您将参数连接到签名中的返回值,但随着时间的推移,这种做法会变得更容易。 您可能想将此示例与第 10 章中的“使用生命周期验证引用”部分进行比较。

Now let’s run the test:

$ cargo test
   Compiling minigrep v0.1.0 (file:///projects/minigrep)
    Finished test [unoptimized + debuginfo] target(s) in 0.97s
     Running unittests src/lib.rs (target/debug/deps/minigrep-9cd200e5fac0fc94)

running 1 test
test tests::one_result ... FAILED

failures:

---- tests::one_result stdout ----
thread 'tests::one_result' panicked at 'assertion failed: `(left == right)`
  left: `["safe, fast, productive."]`,
 right: `[]`', src/lib.rs:44:9
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace


failures:
    tests::one_result

test result: FAILED. 0 passed; 1 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

error: test failed, to rerun pass `--lib`

Great, the test fails, exactly as we expected. Let’s get the test to pass!

Writing Code to Pass the Test

Currently, our test is failing because we always return an empty vector. To fix that and implement search, our program needs to follow these steps:

  • Iterate through each line of the contents.
  • Check whether the line contains our query string.
  • If it does, add it to the list of values we’re returning.
  • If it doesn’t, do nothing.
  • Return the list of results that match. Let’s work through each step, starting with iterating through lines.

目前,我们的测试失败了,因为我们总是返回一个空向量。 为了解决这个问题并实现搜索,我们的程序需要遵循以下步骤:

  • 迭代每一行内容。
  • 检查该行是否包含我们的查询字符串。
  • 如果是,请将其添加到我们返回的值列表中。
  • 如果没有,则什么也不做。
  • 返回匹配的结果列表。 让我们从迭代行开始,完成每个步骤。

Iterating Through Lines with the lines Method

Rust has a helpful method to handle line-by-line iteration of strings, conveniently named lines, that works as shown in Listing 12-17. Note this won’t compile yet.

Rust 有一个有用的方法来处理字符串的逐行迭代,方便地命名为lines,其工作原理如清单12-17所示。 请注意,这还无法编译。

Filename: src/lib.rs

This code does not compile!
pub fn search<'a>(query: &str, contents: &'a str) -> Vec<&'a str> {
    for line in contents.lines() {
        // do something with line
    }
}

Listing 12-17: Iterating through each line in contents

The lines method returns an iterator. We’ll talk about iterators in depth in Chapter 13, but recall that you saw this way of using an iterator in Listing 3-5, where we used a for loop with an iterator to run some code on each item in a collection.

lines 方法返回一个迭代器。 我们将在第 13 章中深入讨论迭代器,但回想一下,您在清单 3-5 中看到了这种使用迭代器的方式,其中我们使用带有迭代器的 for 循环来对集合中的每个项目运行一些代码。

Searching Each Line for the Query

Next, we’ll check whether the current line contains our query string. Fortunately, strings have a helpful method named contains that does this for us! Add a call to the contains method in the search function, as shown in Listing 12-18. Note this still won’t compile yet.

接下来,我们将检查当前行是否包含我们的查询字符串。 幸运的是,字符串有一个名为 contains 的有用方法可以为我们做到这一点! 在搜索函数中添加对 contains 方法的调用,如清单 12-18 所示。 请注意,这仍然无法编译。

Filename: src/lib.rs

This code does not compile!
pub fn search<'a>(query: &str, contents: &'a str) -> Vec<&'a str> {
    for line in contents.lines() {
        if line.contains(query) {
            // do something with line
        }
    }
}

Listing 12-18: Adding functionality to see whether the line contains the string in query

At the moment, we’re building up functionality. To get it to compile, we need to return a value from the body as we indicated we would in the function signature.

目前,我们正在构建功能。 为了让它编译,我们需要从函数体中返回一个值,就像我们在函数签名中指出的那样。

Storing Matching Lines

To finish this function, we need a way to store the matching lines that we want to return. For that, we can make a mutable vector before the for loop and call the push method to store a line in the vector. After the for loop, we return the vector, as shown in Listing 12-19.

为了完成这个功能,我们需要一种方法来存储我们想要返回的匹配行。 为此,我们可以在 for 循环之前创建一个可变向量,并调用 push 方法在向量中存储一行。 在 for 循环之后,我们返回向量,如清单 12-19 所示。

Filename: src/lib.rs

pub fn search<'a>(query: &str, contents: &'a str) -> Vec<&'a str> {
    let mut results = Vec::new();

    for line in contents.lines() {
        if line.contains(query) {
            results.push(line);
        }
    }

    results
}

Listing 12-19: Storing the lines that match so we can return them

Now the search function should return only the lines that contain query, and our test should pass. Let’s run the test:

$ cargo test
   Compiling minigrep v0.1.0 (file:///projects/minigrep)
    Finished test [unoptimized + debuginfo] target(s) in 1.22s
     Running unittests src/lib.rs (target/debug/deps/minigrep-9cd200e5fac0fc94)

running 1 test
test tests::one_result ... ok

test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

     Running unittests src/main.rs (target/debug/deps/minigrep-9cd200e5fac0fc94)

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

   Doc-tests minigrep

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

Our test passed, so we know it works!

At this point, we could consider opportunities for refactoring the implementation of the search function while keeping the tests passing to maintain the same functionality. The code in the search function isn’t too bad, but it doesn’t take advantage of some useful features of iterators. We’ll return to this example in Chapter 13, where we’ll explore iterators in detail, and look at how to improve it.

Using the search Function in the run Function Now that the search function is working and tested, we need to call search from our run function. We need to pass the config.query value and the contents that run reads from the file to the search function. Then run will print each line returned from search:

此时,我们可以考虑重构搜索功能的实现,同时保持测试通过以保持相同的功能。 搜索函数中的代码还不错,但它没有利用迭代器的一些有用功能。 我们将在第 13 章中返回这个示例,详细探讨迭代器,并研究如何改进它。

在run函数中使用search函数 现在搜索函数已经运行并经过测试,我们需要从运行函数中调用搜索。 我们需要将从文件中读取的 config.query 值和内容传递给搜索函数。 然后运行将打印从搜索返回的每一行:

Filename: src/lib.rs

pub fn run(config: Config) -> Result<(), Box<dyn Error>> {
    let contents = fs::read_to_string(config.file_path)?;

    for line in search(&config.query, &contents) {
        println!("{line}");
    }

    Ok(())
}

We’re still using a for loop to return each line from search and print it.

Now the entire program should work! Let’s try it out, first with a word that should return exactly one line from the Emily Dickinson poem, “frog”:

$ cargo run -- frog poem.txt
   Compiling minigrep v0.1.0 (file:///projects/minigrep)
    Finished dev [unoptimized + debuginfo] target(s) in 0.38s
     Running `target/debug/minigrep frog poem.txt`
How public, like a frog

Cool! Now let’s try a word that will match multiple lines, like “body”:

$ cargo run -- body poem.txt
   Compiling minigrep v0.1.0 (file:///projects/minigrep)
    Finished dev [unoptimized + debuginfo] target(s) in 0.0s
     Running `target/debug/minigrep body poem.txt`
I'm nobody! Who are you?
Are you nobody, too?
How dreary to be somebody!

And finally, let’s make sure that we don’t get any lines when we search for a word that isn’t anywhere in the poem, such as “monomorphization”:

$ cargo run -- monomorphization poem.txt
   Compiling minigrep v0.1.0 (file:///projects/minigrep)
    Finished dev [unoptimized + debuginfo] target(s) in 0.0s
     Running `target/debug/minigrep monomorphization poem.txt`

Excellent! We’ve built our own mini version of a classic tool and learned a lot about how to structure applications. We’ve also learned a bit about file input and output, lifetimes, testing, and command line parsing.

To round out this project, we’ll briefly demonstrate how to work with environment variables and how to print to standard error, both of which are useful when you’re writing command line programs.

出色的! 我们构建了自己的迷你版经典工具,并了解了很多有关如何构建应用程序的知识。 我们还了解了一些有关文件输入和输出、生命周期、测试和命令行解析的知识。

为了完善这个项目,我们将简要演示如何使用环境变量以及如何打印到标准错误,这两者在编写命令行程序时都很有用。

12.5 Working with Environment Variables

We’ll improve minigrep by adding an extra feature: an option for case-insensitive searching that the user can turn on via an environment variable. We could make this feature a command line option and require that users enter it each time they want it to apply, but by instead making it an environment variable, we allow our users to set the environment variable once and have all their searches be case insensitive in that terminal session.

Writing a Failing Test for the Case-Insensitive search Function We first add a new search_case_insensitive function that will be called when the environment variable has a value. We’ll continue to follow the TDD process, so the first step is again to write a failing test. We’ll add a new test for the new search_case_insensitive function and rename our old test from one_result to case_sensitive to clarify the differences between the two tests, as shown in Listing 12-20.

我们将通过添加一个额外的功能来改进 minigrep:用户可以通过环境变量打开的不区分大小写搜索的选项。 我们可以将此功能设置为命令行选项,并要求用户每次希望应用时输入它,但通过将其设置为环境变量,我们允许用户设置环境变量一次,并使所有搜索不区分大小写 在那个终端会话中。

为不区分大小写的搜索函数编写失败测试 我们首先添加一个新的 search_case_insensitive 函数,当环境变量有值时将调用该函数。 我们将继续遵循 TDD 流程,因此第一步仍然是编写失败的测试。 我们将为新的 search_case_insensitive 函数添加一个新测试,并将旧测试从 one_result 重命名为 case_sensitive,以阐明两个测试之间的差异,如清单 12-20 所示。

Filename: src/lib.rs

This code does not compile!
#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn case_sensitive() {
        let query = "duct";
        let contents = "\
Rust:
safe, fast, productive.
Pick three.
Duct tape.";

        assert_eq!(vec!["safe, fast, productive."], search(query, contents));
    }

    #[test]
    fn case_insensitive() {
        let query = "rUsT";
        let contents = "\
Rust:
safe, fast, productive.
Pick three.
Trust me.";

        assert_eq!(
            vec!["Rust:", "Trust me."],
            search_case_insensitive(query, contents)
        );
    }
}

Listing 12-20: Adding a new failing test for the case-insensitive function we’re about to add

Note that we’ve edited the old test’s contents too. We’ve added a new line with the text "Duct tape." using a capital D that shouldn’t match the query "duct" when we’re searching in a case-sensitive manner. Changing the old test in this way helps ensure that we don’t accidentally break the case-sensitive search functionality that we’ve already implemented. This test should pass now and should continue to pass as we work on the case-insensitive search.

The new test for the case-insensitive search uses "rUsT" as its query. In the search_case_insensitive function we’re about to add, the query "rUsT" should match the line containing "Rust:" with a capital R and match the line "Trust me." even though both have different casing from the query. This is our failing test, and it will fail to compile because we haven’t yet defined the search_case_insensitive function. Feel free to add a skeleton implementation that always returns an empty vector, similar to the way we did for the search function in Listing 12-16 to see the test compile and fail.

请注意,我们也编辑了旧测试的内容。 我们添加了一个新行,其中包含文本“管道胶带”。 当我们以区分大小写的方式搜索时,使用大写 D 不应与查询“duct”匹配。 以这种方式更改旧测试有助于确保我们不会意外破坏我们已经实现的区分大小写的搜索功能。 该测试现在应该通过,并且在我们进行不区分大小写的搜索时应该继续通过。

不区分大小写搜索的新测试使用“rUsT”作为其查询。 在我们要添加的 search_case_insensitive 函数中,查询“rUsT”应该与包含大写 R 的“Rust:”的行匹配,并与“Trust me”行匹配。 即使两者的查询大小写不同。 这是我们失败的测试,它将无法编译,因为我们还没有定义 search_case_insensitive 函数。 请随意添加一个始终返回空向量的框架实现,类似于我们对清单 12-16 中的搜索函数所做的方式,以查看测试编译和失败。

Implementing the search_case_insensitive Function

The search_case_insensitive function, shown in Listing 12-21, will be almost the same as the search function. The only difference is that we’ll lowercase the query and each line so whatever the case of the input arguments, they’ll be the same case when we check whether the line contains the query.

search_case_insensitive 函数(如清单 12-21 所示)几乎与搜索函数相同。 唯一的区别是,我们将查询和每行小写,因此无论输入参数的大小写如何,当我们检查该行是否包含查询时,它们的大小写都是相同的。

Filename: src/lib.rs

pub fn search_case_insensitive<'a>(
    query: &str,
    contents: &'a str,
) -> Vec<&'a str> {
    let query = query.to_lowercase();
    let mut results = Vec::new();

    for line in contents.lines() {
        if line.to_lowercase().contains(&query) {
            results.push(line);
        }
    }

    results
}

Listing 12-21: Defining the search_case_insensitive function to lowercase the query and the line before comparing them

First, we lowercase the query string and store it in a shadowed variable with the same name. Calling to_lowercase on the query is necessary so no matter whether the user’s query is "rust", "RUST", "Rust", or "rUsT", we’ll treat the query as if it were "rust" and be insensitive to the case. While to_lowercase will handle basic Unicode, it won’t be 100% accurate. If we were writing a real application, we’d want to do a bit more work here, but this section is about environment variables, not Unicode, so we’ll leave it at that here.

Note that query is now a String rather than a string slice, because calling to_lowercase creates new data rather than referencing existing data. Say the query is "rUsT", as an example: that string slice doesn’t contain a lowercase u or t for us to use, so we have to allocate a new String containing "rust". When we pass query as an argument to the contains method now, we need to add an ampersand because the signature of contains is defined to take a string slice.

Next, we add a call to to_lowercase on each line to lowercase all characters. Now that we’ve converted line and query to lowercase, we’ll find matches no matter what the case of the query is.

首先,我们将查询字符串小写并将其存储在同名的隐藏变量中。 在查询上调用 to_lowercase 是必要的,因此无论用户的查询是“rust”、“RUST”、“Rust”还是“rUsT”,我们都会将查询视为“rust”并且对 案件。 虽然 to_lowercase 可以处理基本的 Unicode,但它不会 100% 准确。 如果我们正在编写一个真正的应用程序,我们需要在这里做更多的工作,但是本节是关于环境变量,而不是 Unicode,所以我们将其留在这里。

请注意,查询现在是字符串而不是字符串切片,因为调用 to_lowercase 会创建新数据而不是引用现有数据。 假设查询是“rUsT”,例如:该字符串切片不包含可供我们使用的小写 u 或 t,因此我们必须分配一个包含“rust”的新字符串。 现在,当我们将查询作为参数传递给 contains 方法时,我们需要添加一个 & 符号,因为 contains 的签名被定义为采用字符串切片。

接下来,我们在每行添加对 to_lowercase 的调用,以小写所有字符。 现在我们已经将行和查询转换为小写,无论查询的大小写如何,我们都会找到匹配项。

Let’s see if this implementation passes the tests:

$ cargo test
   Compiling minigrep v0.1.0 (file:///projects/minigrep)
    Finished test [unoptimized + debuginfo] target(s) in 1.33s
     Running unittests src/lib.rs (target/debug/deps/minigrep-9cd200e5fac0fc94)

running 2 tests
test tests::case_insensitive ... ok
test tests::case_sensitive ... ok

test result: ok. 2 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

     Running unittests src/main.rs (target/debug/deps/minigrep-9cd200e5fac0fc94)

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

   Doc-tests minigrep

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

Great! They passed. Now, let’s call the new search_case_insensitive function from the run function. First, we’ll add a configuration option to the Config struct to switch between case-sensitive and case-insensitive search. Adding this field will cause compiler errors because we aren’t initializing this field anywhere yet:

伟大的! 他们通过了。 现在,让我们从 run 函数中调用新的 search_case_insensitive 函数。 首先,我们将向 Config 结构添加一个配置选项,以在区分大小写和不区分大小写的搜索之间切换。 添加此字段将导致编译器错误,因为我们尚未在任何地方初始化此字段:

Filename: src/lib.rs

This code does not compile!
pub struct Config {
    pub query: String,
    pub file_path: String,
    pub ignore_case: bool,
}

We added the ignore_case field that holds a Boolean. Next, we need the run function to check the ignore_case field’s value and use that to decide whether to call the search function or the search_case_insensitive function, as shown in Listing 12-22. This still won’t compile yet.

我们添加了包含布尔值的ignore_case字段。 接下来,我们需要 run 函数来检查ignore_case字段的值,并用它来决定是调用search函数还是search_case_insensitive函数,如清单12-22所示。 这仍然无法编译。

Filename: src/lib.rs

This code does not compile!
pub fn run(config: Config) -> Result<(), Box<dyn Error>> {
    let contents = fs::read_to_string(config.file_path)?;

    let results = if config.ignore_case {
        search_case_insensitive(&config.query, &contents)
    } else {
        search(&config.query, &contents)
    };

    for line in results {
        println!("{line}");
    }

    Ok(())
}

Listing 12-22: Calling either search or search_case_insensitive based on the value in config.ignore_case

Finally, we need to check for the environment variable. The functions for working with environment variables are in the env module in the standard library, so we bring that module into scope at the top of src/lib.rs. Then we’ll use the var function from the env module to check to see if any value has been set for an environment variable named IGNORE_CASE, as shown in Listing 12-23.

最后,我们需要检查环境变量。 用于处理环境变量的函数位于标准库的 env 模块中,因此我们将该模块纳入 src/lib.rs 顶部的范围内。 然后,我们将使用 env 模块中的 var 函数来检查是否已为名为 IGNORE_CASE 的环境变量设置了任何值,如清单 12-23 所示。

Filename: src/lib.rs

use std::env;
// --snip--

impl Config {
    pub fn build(args: &[String]) -> Result<Config, &'static str> {
        if args.len() < 3 {
            return Err("not enough arguments");
        }

        let query = args[1].clone();
        let file_path = args[2].clone();

        let ignore_case = env::var("IGNORE_CASE").is_ok();

        Ok(Config {
            query,
            file_path,
            ignore_case,
        })
    }
}

Listing 12-23: Checking for any value in an environment variable named IGNORE_CASE

Here, we create a new variable ignore_case. To set its value, we call the env::var function and pass it the name of the IGNORE_CASE environment variable. The env::var function returns a Result that will be the successful Ok variant that contains the value of the environment variable if the environment variable is set to any value. It will return the Err variant if the environment variable is not set.

We’re using the is_ok method on the Result to check whether the environment variable is set, which means the program should do a case-insensitive search. If the IGNORE_CASE environment variable isn’t set to anything, is_ok will return false and the program will perform a case-sensitive search. We don’t care about the value of the environment variable, just whether it’s set or unset, so we’re checking is_ok rather than using unwrap, expect, or any of the other methods we’ve seen on Result.

We pass the value in the ignore_case variable to the Config instance so the run function can read that value and decide whether to call search_case_insensitive or search, as we implemented in Listing 12-22.

Let’s give it a try! First, we’ll run our program without the environment variable set and with the query to, which should match any line that contains the word “to” in all lowercase:

在这里,我们创建一个新变量ignore_case。 要设置其值,我们调用 env::var 函数并向其传递 IGNORE_CASE 环境变量的名称。 如果环境变量设置为任何值,则 env::var 函数返回一个 Result,该结果将是成功的 Ok 变体,其中包含环境变量的值。 如果未设置环境变量,它将返回 Err 变体。

我们在 Result 上使用 is_ok 方法来检查是否设置了环境变量,这意味着程序应该执行不区分大小写的搜索。 如果 IGNORE_CASE 环境变量未设置任何内容,is_ok 将返回 false,并且程序将执行区分大小写的搜索。 我们不关心环境变量的值,只关心它是设置还是未设置,因此我们检查 is_ok 而不是使用 unwrap、expect 或我们在 Result 上看到的任何其他方法。

我们将ignore_case变量中的值传递给Config实例,以便run函数可以读取该值并决定是否调用search_case_insensitive或search,如清单12-22中实现的那样。

试一试吧! 首先,我们将在没有设置环境变量和查询 to 的情况下运行我们的程序,该查询应该与包含全小写单词“to”的任何行匹配:

$ cargo run -- to poem.txt
   Compiling minigrep v0.1.0 (file:///projects/minigrep)
    Finished dev [unoptimized + debuginfo] target(s) in 0.0s
     Running `target/debug/minigrep to poem.txt`
Are you nobody, too?
How dreary to be somebody!

Looks like that still works! Now, let’s run the program with IGNORE_CASE set to 1 but with the same query to.

$ IGNORE_CASE=1 cargo run -- to poem.txt

If you’re using PowerShell, you will need to set the environment variable and run the program as separate commands:

PS> $Env:IGNORE_CASE=1; cargo run -- to poem.txt This will make IGNORE_CASE persist for the remainder of your shell session. It can be unset with the Remove-Item cmdlet:

PS> Remove-Item Env:IGNORE_CASE

We should get lines that contain “to” that might have uppercase letters:

Are you nobody, too?
How dreary to be somebody!
To tell your name the livelong day
To an admiring bog!

Excellent, we also got lines containing “To”! Our minigrep program can now do case-insensitive searching controlled by an environment variable. Now you know how to manage options set using either command line arguments or environment variables.

Some programs allow arguments and environment variables for the same configuration. In those cases, the programs decide that one or the other takes precedence. For another exercise on your own, try controlling case sensitivity through either a command line argument or an environment variable. Decide whether the command line argument or the environment variable should take precedence if the program is run with one set to case sensitive and one set to ignore case.

The std::env module contains many more useful features for dealing with environment variables: check out its documentation to see what is available.

太棒了,我们还得到了包含“To”的行! 我们的 minigrep 程序现在可以执行由环境变量控制的不区分大小写的搜索。 现在您知道如何使用命令行参数或环境变量来管理选项集。

有些程序允许相同配置的参数和环境变量。 在这些情况下,程序会决定其中一个优先。 对于您自己的另一个练习,请尝试通过命令行参数或环境变量控制区分大小写。 如果程序以一组区分大小写和一组忽略大小写的方式运行,请确定命令行参数或环境变量是否应优先。

std::env 模块包含许多用于处理环境变量的更有用的功能:查看其文档以了解可用的功能。

12.6 Writing Error Messages to Standard Error Instead of Standard Output

At the moment, we’re writing all of our output to the terminal using the println! macro. In most terminals, there are two kinds of output: standard output (stdout) for general information and standard error (stderr) for error messages. This distinction enables users to choose to direct the successful output of a program to a file but still print error messages to the screen.

The println! macro is only capable of printing to standard output, so we have to use something else to print to standard error.

目前,我们正在使用 println! 将所有输出写入终端。 宏。 在大多数终端中,有两种输出:用于一般信息的标准输出(stdout)和用于错误消息的标准错误(stderr)。 这种区别使用户可以选择将程序成功输出到文件,但仍将错误消息打印到屏幕上。

打印! 宏只能打印到标准输出,因此我们必须使用其他东西来打印到标准错误。

Checking Where Errors Are Written

First, let’s observe how the content printed by minigrep is currently being written to standard output, including any error messages we want to write to standard error instead. We’ll do that by redirecting the standard output stream to a file while intentionally causing an error. We won’t redirect the standard error stream, so any content sent to standard error will continue to display on the screen.

Command line programs are expected to send error messages to the standard error stream so we can still see error messages on the screen even if we redirect the standard output stream to a file. Our program is not currently well-behaved: we’re about to see that it saves the error message output to a file instead!

To demonstrate this behavior, we’ll run the program with > and the file path, output.txt, that we want to redirect the standard output stream to. We won’t pass any arguments, which should cause an error:

首先,让我们观察 minigrep 打印的内容当前如何写入标准输出,包括我们想要写入标准错误的任何错误消息。 我们将通过将标准输出流重定向到文件来实现这一点,同时故意引发错误。 我们不会重定向标准错误流,因此发送到标准错误的任何内容将继续显示在屏幕上。

命令行程序应该将错误消息发送到标准错误流,因此即使我们将标准输出流重定向到文件,我们仍然可以在屏幕上看到错误消息。 我们的程序目前表现不佳:我们将看到它将错误消息输出保存到文件中!

为了演示此行为,我们将使用 > 和文件路径 output.txt 运行程序,我们要将标准输出流重定向到该文件路径。 我们不会传递任何参数,这会导致错误:

$ cargo run > output.txt

The > syntax tells the shell to write the contents of standard output to output.txt instead of the screen. We didn’t see the error message we were expecting printed to the screen, so that means it must have ended up in the file. This is what output.txt contains:

语法告诉 shell 将标准输出的内容写入 output.txt 而不是屏幕。 我们没有看到我们期望打印到屏幕上的错误消息,所以这意味着它一定已经出现在文件中。 这是output.txt 包含的内容:

Problem parsing arguments: not enough arguments

Yup, our error message is being printed to standard output. It’s much more useful for error messages like this to be printed to standard error so only data from a successful run ends up in the file. We’ll change that.

是的,我们的错误消息正在打印到标准输出。 将这样的错误消息打印到标准错误会更有用,因此只有成功运行的数据才会出现在文件中。 我们会改变这一点。

Printing Errors to Standard Error

We’ll use the code in Listing 12-24 to change how error messages are printed. Because of the refactoring we did earlier in this chapter, all the code that prints error messages is in one function, main. The standard library provides the eprintln! macro that prints to the standard error stream, so let’s change the two places we were calling println! to print errors to use eprintln! instead.

我们将使用清单 12-24 中的代码来更改错误消息的打印方式。 由于我们在本章前面所做的重构,所有打印错误消息的代码都在一个函数 main 中。 标准库提供了 eprintln! 打印到标准错误流的宏,所以让我们改变调用 println 的两个地方! 打印错误使用eprintln! 反而。

Filename: src/main.rs

fn main() {
    let args: Vec<String> = env::args().collect();

    let config = Config::build(&args).unwrap_or_else(|err| {
        eprintln!("Problem parsing arguments: {err}");
        process::exit(1);
    });

    if let Err(e) = minigrep::run(config) {
        eprintln!("Application error: {e}");
        process::exit(1);
    }
}

Listing 12-24: Writing error messages to standard error instead of standard output using eprintln!

Let’s now run the program again in the same way, without any arguments and redirecting standard output with >:

$ cargo run > output.txt
Problem parsing arguments: not enough arguments

Now we see the error onscreen and output.txt contains nothing, which is the behavior we expect of command line programs.

Let’s run the program again with arguments that don’t cause an error but still redirect standard output to a file, like so:

$ cargo run -- to poem.txt > output.txt

We won’t see any output to the terminal, and output.txt will contain our results:

Filename: output.txt

Are you nobody, too?
How dreary to be somebody!

This demonstrates that we’re now using standard output for successful output and standard error for error output as appropriate.

12.7 Summary

This chapter recapped some of the major concepts you’ve learned so far and covered how to perform common I/O operations in Rust. By using command line arguments, files, environment variables, and the eprintln! macro for printing errors, you’re now prepared to write command line applications. Combined with the concepts in previous chapters, your code will be well organized, store data effectively in the appropriate data structures, handle errors nicely, and be well tested.

Next, we’ll explore some Rust features that were influenced by functional languages: closures and iterators.

本章回顾了您迄今为止学到的一些主要概念,并介绍了如何在 Rust 中执行常见的 I/O 操作。 通过使用命令行参数、文件、环境变量和 eprintln! 宏用于打印错误,您现在准备编写命令行应用程序。 结合前面章节中的概念,您的代码将组织良好,在适当的数据结构中有效存储数据,很好地处理错误,并经过良好的测试。

接下来,我们将探讨一些受函数式语言影响的 Rust 功能:闭包和迭代器。