Port 'string-lessp' to Rust #217

DavidDeSimone · 2017-06-25T04:19:00Z

This is a WIP PR for porting string-lessp to Rust. As part of this work, it seemed useful to add an iterator for a LispStringRef, in order to replace the macro FETCH_STRING_CHAR_ADVANCE

Work left to do:

Port basic string-lessp functionality to Rust
Remove string-lessp implmentation in C.
Add code point iterator for LispStringRef
Add safe API for working with LispSymbol's, similar to LispStringRef/as_string/as_string_or_error
- Fix issue where the value coming out of the LispSymbolRef's mem::transmute seem to be nonsense
Verify that the alignment/size/padding of Rust's Lisp_Symbol matches the alignment/size/padding of C's Lisp_Symbol
Basic profiling to make sure that these new concepts do not add significant cost over their C equivalents. (Based on my findings, performance for this function is equivalent to it's C counterpart)

…tringRef. This commit is a work is part of a work in progress.

Conflicts: rust_src/src/strings.rs

…stead of incrementing the char counter by 1 per iteration.

birkenfeld · 2017-06-25T06:00:53Z

rust_src/src/lisp.rs

@@ -155,6 +155,10 @@ impl LispObject {
    pub fn is_symbol(self) -> bool {
        self.get_type() == LispType::Lisp_Symbol
    }
+
+    pub fn symbol_name(&self) -> LispObject {


This is not safe - it'll treat the LispObject as a symbol regardless of its type.

You'll have to introduce a LispSymbolRef newtype and respective as_symbol and as_symbol_or_error methods on LispObject to convert safely.

Good callout, I will do this next.

birkenfeld · 2017-06-25T06:01:20Z

rust_src/src/multibyte.rs

@@ -106,6 +106,22 @@ impl LispStringRef {
    }
 }

+// Substitue for FETCH_STRING_CHAR_ADVANCE
+impl Iterator for LispStringRef {


Great idea to use an iterator here!

birkenfeld · 2017-06-25T06:02:46Z

rust_src/src/strings.rs

+        i1 += 1;
+
+        if codept1 != codept2 {
+            if codept1 < codept2 {


This is just return LispObject::from_bool(codept1 < codept2)

birkenfeld · 2017-06-25T06:02:59Z

rust_src/src/strings.rs

+        }
+    }
+
+    if i1 < lispstr2.len_bytes() {


same here (without return)

birkenfeld · 2017-06-25T06:05:03Z

rust_src/src/strings.rs

+    let end = cmp::min(lispstr1.len_bytes(), lispstr2.len_bytes());
+    let mut i1 = 0;
+    while i1 < end {
+        // Unwraps should be fine here, due to our manual tracking of


hmm, could this be done with a zip? I don't think we need i1 afterwards since if we finish the loop it will always be = end.

Also another good call out, I've changed the implementation to use zip

… codepoints on a LispString. Declaring string_lessp in Rust as a #[lisp_fn]. Removing c definition of string-lessp.

…instead of len_chars, causing a potential iteration panic. Adding iteration zip implementation for cleaner code.

birkenfeld · 2017-06-26T15:56:47Z

rust_src/src/multibyte.rs

+                codepoint = cp;
+                self.cur += advance;
+            } else {
+                codepoint = ref_slice[self.cur] as u32;


.. as Codepoint?

birkenfeld · 2017-06-26T15:59:22Z

rust_src/src/multibyte.rs

+                self.cur += 1;
+            }
+
+            Some((codepoint, self.cur))


Hm, this returns the index of the next codepoint. Is this intended?

birkenfeld · 2017-06-26T15:59:52Z

rust_src/src/multibyte.rs

+}
+
+impl LispStringRef {
+    pub fn iter(&self) -> LispStringRefIterator {


Depending on need by other APIs, I think two iterators a la chars() and char_indices() would make sense.

This sounds reasonable to me, I like the idea of a chars() and char_indicies().

birkenfeld · 2017-06-26T16:03:16Z

rust_src/src/strings.rs

+        }
+    }
+
+    LispObject::from_bool(count < lispstr2.len_chars())


I think this is just lispstr1.len_chars() < lispstr2.len_chars(), and keeping count is unnecessary.

(If lispstr2 is shorter or equally long, this will test len2 < len2 which is false.)

birkenfeld · 2017-06-26T16:03:55Z

rust_src/src/strings.rs

@@ -1,6 +1,6 @@
 //! Functions operating on strings.

-use std::ptr;
+use std::{ptr, cmp};


unused import?

birkenfeld · 2017-06-26T16:04:28Z

rust_src/src/lisp.rs

@@ -155,6 +155,10 @@ impl LispObject {
    pub fn is_symbol(self) -> bool {
        self.get_type() == LispType::Lisp_Symbol
    }
+
+    pub fn symbol_name(&self) -> LispObject {


…the current codepoint location, not the next offset location. Updating string-lessp logic to not need an explicit 'count' variable

…. This will allow us to create a LispSymbolRef like we have a LispStringRef. This will also allow a similar API for working with symbols as we have for strings.

birkenfeld · 2017-06-29T04:23:13Z

rust_src/src/strings.rs

 fn get_string_or_symbol(mut string: LispObject) -> multibyte::LispStringRef {
    if string.is_symbol() {
-        string = string.symbol_name()
+        string = string.as_symbol_or_error().symbol_name()


This still duplicates the symbol check. I'd make it

match string.as_symbol() { Some(sym) => sym.symbol_name().as_string().expect("symbol name not a string?") None => string.as_string_or_error() }

If you like, you can also make string_equal use this function, it currently uses SYMBOL_NAME directly.

…ged Rust union for a SymbolUnion. Updating the implementation of 'get_string_or_symbol' to avoid an additional symbol check.

DavidDeSimone · 2017-07-04T04:13:16Z

rust_src/src/lisp.rs

+    #[inline]
+    pub fn as_symbol(&self) -> Option<LispSymbolRef> {
+        if self.is_symbol() {
+            Some(LispSymbolRef::new(unsafe { mem::transmute(self.get_untaggedptr()) }))


This looks to be incorrect. The code in lisp.h for XSYMBOL looks like

INLINE struct Lisp_Symbol * (XSYMBOL) (Lisp_Object a) { #if USE_LSB_TAG return lisp_h_XSYMBOL (a); #else eassert (SYMBOLP (a)); intptr_t i = (intptr_t) XUNTAG (a, Lisp_Symbol); void *p = (char *) lispsym + i; return p; #endif }

and it looks like we will have to emulate this logic for getting the address to mem::transmute.

shanavas786 · 2017-07-04T05:24:41Z

rust_src/remacs-sys/lib.rs

+// @TODO check the value of name post and pre transmutation, it seems that name is surviving but
+// may not be the correct value
+#[repr(C)]
+pub struct Lisp_Symbol {


gcmarkbit, redirect ...etc are missing ?

So this is my understanding of the situation re: the mentioned variables. If anyone spots any incorrect information please do not hesitate to correct me:

Lisp_Symbol has the following definition in lisp.h (I've stripped out the comments for clarity):

struct Lisp_Symbol { bool_bf gcmarkbit : 1; ENUM_BF (symbol_redirect) redirect : 3; ENUM_BF (symbol_trapped_write) trapped_write : 2; unsigned interned : 2; bool_bf declared_special : 1; bool_bf pinned : 1; Lisp_Object name; union { Lisp_Object value; struct Lisp_Symbol *alias; struct Lisp_Buffer_Local_Value *blv; union Lisp_Fwd *fwd; } val; Lisp_Object function; Lisp_Object plist; struct Lisp_Symbol *next;

This struct is using the C notation for bit fields. According to my research, Rust does not (rust-lang/rfcs#314, https://users.rust-lang.org/t/c-structs-with-bit-fields-and-ffi/1429) support setting up an equivalent for C bitfieldsin it's #[repr(C)] directive.

On my system (64-bit Ubuntu 16.04), no matter what the typedef of bf_bool or ENUM_BF, the bit field section of the struct takes 4 bytes. Due to alignment, the struct is padded, and offsetof(struct Lisp_Symbol, name) reports 8. My initial solution to representing this struct in Rust was to represent the bit field block as a u32, taking up the 4 bytes I mentioned earlier. I have not fully convinced myself that this is 100% the correct and portable thing to do.

Even if we can safely represent this block with a u32 on every system we support, it seems that there are 'gotcha's with accessing these bit fields via the bit wise operators (due to endianness, and compiler differences w.r.t bit field implementation.)

Overall I am not sure the best way to handle the interop for C structs that use bit fields that we need to access in Rust. It seems that one option is to simply pad the Rust struct as best we can, and if we need to access these fields, we will need to maintain C bindings that access them for us.

…guments. This was due to an improper call to mem::transmute in the Rust layer. Symbols are a special case in which you cannot just call mem::transmute(self.get_untaggedptr()). Instead, one must offset the pointer value based on the memory address of an emacs global 'lispsym'.

…Ref. This will allow a user to loop over the codepoints of a LispStringRef, or the indicies of the codepoints of a LispStringRef.

…ring

birkenfeld · 2017-07-06T06:40:13Z

rust_src/src/multibyte.rs

+        LispStringRefCharIterator(self.iter())
+    }
+
+    pub fn char_indices(&self) -> LispStringRefIndexIterator {


Sorry to nitpick again, but actually char_indices returns the (index, char) tuple - you just need to rename iter.

No need to apologize, this should be fixed now.

DavidDeSimone · 2017-07-06T06:49:33Z

Overall, I feel pretty confident with the PR now. If any maintainers have additional feedback, I will be happy to address their concerns, otherwise I think this is ready to merge.

Wilfred

This looks great to me: code looks clean :). Other than one missing docstring, I think it's good to merge.

Wilfred · 2017-07-09T22:53:04Z

rust_src/src/multibyte.rs

+
+pub struct LispStringRefCharIterator<'a>(LispStringRefIterator<'a>);
+
+// Substitue for FETCH_STRING_CHAR_ADVANCE


*Substitute

Wilfred · 2017-07-09T22:53:35Z

rust_src/src/strings.rs

@@ -202,6 +202,21 @@ fn string_to_unibyte(string: LispObject) -> LispObject {
    }
 }

+#[lisp_fn]


Please put the docstring here.

… in comment about LispStringRefIterator

Wilfred · 2017-07-11T23:01:47Z

Marvellous :)

DavidDeSimone added 3 commits June 24, 2017 21:05

Porting 'string-lessp' to Rust. Adding a Codepoint iterator for LispS…

9527486

…tringRef. This commit is a work is part of a work in progress.

Merge branch 'master' into string-lessp

e0b5f93

Conflicts: rust_src/src/strings.rs

Fixing range bug in string-lessp caused by adding the byte offset, in…

c21e4d5

…stead of incrementing the char counter by 1 per iteration.

birkenfeld reviewed Jun 25, 2017

View reviewed changes

DavidDeSimone added 2 commits June 25, 2017 23:58

Adding LispStringRefIterator, an iterator used for iterating over the…

be284af

… codepoints on a LispString. Declaring string_lessp in Rust as a #[lisp_fn]. Removing c definition of string-lessp.

Fixing issue where lisp strings in string-lessp were using len_bytes …

420ab03

…instead of len_chars, causing a potential iteration panic. Adding iteration zip implementation for cleaner code.

birkenfeld reviewed Jun 26, 2017

View reviewed changes

David DeSimone added 2 commits June 28, 2017 11:49

Removing unused cmp import. Updating LispStringRefIterator to return …

0bc4752

…the current codepoint location, not the next offset location. Updating string-lessp logic to not need an explicit 'count' variable

Laying the groundwork for representing a lisp symbol as a rust struct…

ae8d366

…. This will allow us to create a LispSymbolRef like we have a LispStringRef. This will also allow a similar API for working with symbols as we have for strings.

birkenfeld reviewed Jun 29, 2017

View reviewed changes

Adding C representation of Lisp_Symbol. This includes making an untag…

010b742

…ged Rust union for a SymbolUnion. Updating the implementation of 'get_string_or_symbol' to avoid an additional symbol check.

DavidDeSimone commented Jul 4, 2017

View reviewed changes

shanavas786 reviewed Jul 4, 2017

View reviewed changes

DavidDeSimone added 6 commits July 5, 2017 01:26

Running rust-fmt for properly formatted code.

cf859d5

Another run of cargo fmt in remacs-sys

bf956b9

Adding "chars()", and "char_indicies()" implementation for LispString…

f169c24

…Ref. This will allow a user to loop over the codepoints of a LispStringRef, or the indicies of the codepoints of a LispStringRef.

Updating the iterator API of a LispStringRef to match std::string::St…

adaf0ef

…ring

Updating as_symbol_or_string to have a more descriptive function name

cdef968

birkenfeld reviewed Jul 6, 2017

View reviewed changes

DavidDeSimone changed the title ~~[WIP] Port 'string-lessp' to Rust~~ Port 'string-lessp' to Rust Jul 6, 2017

birkenfeld approved these changes Jul 6, 2017

View reviewed changes

Wilfred requested changes Jul 9, 2017

View reviewed changes

Adding docstring to Rust impl for string-lessp. Fixing spelling error…

567941d

… in comment about LispStringRefIterator

Wilfred approved these changes Jul 11, 2017

View reviewed changes

Wilfred merged commit 008064a into remacs:master Jul 11, 2017

DavidDeSimone deleted the string-lessp branch July 11, 2017 23:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Port 'string-lessp' to Rust #217

Port 'string-lessp' to Rust #217

DavidDeSimone commented Jun 25, 2017 •

edited

birkenfeld Jun 25, 2017

DavidDeSimone Jun 26, 2017

birkenfeld Jun 26, 2017

birkenfeld Jun 25, 2017

birkenfeld Jun 25, 2017

birkenfeld Jun 25, 2017

birkenfeld Jun 25, 2017

DavidDeSimone Jun 26, 2017

birkenfeld Jun 26, 2017

birkenfeld Jun 26, 2017

birkenfeld Jun 26, 2017

DavidDeSimone Jul 5, 2017

birkenfeld Jun 26, 2017

birkenfeld Jun 26, 2017

birkenfeld Jun 26, 2017

birkenfeld Jun 29, 2017

DavidDeSimone Jul 4, 2017 •

edited

shanavas786 Jul 4, 2017

DavidDeSimone Jul 5, 2017

birkenfeld Jul 6, 2017

DavidDeSimone Jul 6, 2017

DavidDeSimone commented Jul 6, 2017

Wilfred left a comment

Wilfred Jul 9, 2017

Wilfred Jul 9, 2017

Wilfred commented Jul 11, 2017


		pub struct LispStringRefCharIterator<'a>(LispStringRefIterator<'a>);

		// Substitue for FETCH_STRING_CHAR_ADVANCE

Port 'string-lessp' to Rust #217

Port 'string-lessp' to Rust #217

Conversation

DavidDeSimone commented Jun 25, 2017 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

DavidDeSimone Jul 4, 2017 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

DavidDeSimone commented Jul 6, 2017

Wilfred left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Wilfred commented Jul 11, 2017

DavidDeSimone commented Jun 25, 2017 •

edited

DavidDeSimone Jul 4, 2017 •

edited