New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Port 'string-lessp' to Rust #217
Conversation
…tringRef. This commit is a work is part of a work in progress.
Conflicts: rust_src/src/strings.rs
…stead of incrementing the char counter by 1 per iteration.
rust_src/src/lisp.rs
Outdated
@@ -155,6 +155,10 @@ impl LispObject { | |||
pub fn is_symbol(self) -> bool { | |||
self.get_type() == LispType::Lisp_Symbol | |||
} | |||
|
|||
pub fn symbol_name(&self) -> LispObject { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not safe - it'll treat the LispObject
as a symbol regardless of its type.
You'll have to introduce a LispSymbolRef
newtype and respective as_symbol
and as_symbol_or_error
methods on LispObject
to convert safely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good callout, I will do this next.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
rust_src/src/multibyte.rs
Outdated
@@ -106,6 +106,22 @@ impl LispStringRef { | |||
} | |||
} | |||
|
|||
// Substitue for FETCH_STRING_CHAR_ADVANCE | |||
impl Iterator for LispStringRef { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great idea to use an iterator here!
rust_src/src/strings.rs
Outdated
i1 += 1; | ||
|
||
if codept1 != codept2 { | ||
if codept1 < codept2 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is just return LispObject::from_bool(codept1 < codept2)
rust_src/src/strings.rs
Outdated
} | ||
} | ||
|
||
if i1 < lispstr2.len_bytes() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here (without return
)
rust_src/src/strings.rs
Outdated
let end = cmp::min(lispstr1.len_bytes(), lispstr2.len_bytes()); | ||
let mut i1 = 0; | ||
while i1 < end { | ||
// Unwraps should be fine here, due to our manual tracking of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, could this be done with a zip
? I don't think we need i1
afterwards since if we finish the loop it will always be = end
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also another good call out, I've changed the implementation to use zip
… codepoints on a LispString. Declaring string_lessp in Rust as a #[lisp_fn]. Removing c definition of string-lessp.
…instead of len_chars, causing a potential iteration panic. Adding iteration zip implementation for cleaner code.
rust_src/src/multibyte.rs
Outdated
codepoint = cp; | ||
self.cur += advance; | ||
} else { | ||
codepoint = ref_slice[self.cur] as u32; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.. as Codepoint?
rust_src/src/multibyte.rs
Outdated
self.cur += 1; | ||
} | ||
|
||
Some((codepoint, self.cur)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, this returns the index of the next codepoint. Is this intended?
rust_src/src/multibyte.rs
Outdated
} | ||
|
||
impl LispStringRef { | ||
pub fn iter(&self) -> LispStringRefIterator { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Depending on need by other APIs, I think two iterators a la chars()
and char_indices()
would make sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sounds reasonable to me, I like the idea of a chars() and char_indicies().
rust_src/src/strings.rs
Outdated
} | ||
} | ||
|
||
LispObject::from_bool(count < lispstr2.len_chars()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is just lispstr1.len_chars() < lispstr2.len_chars()
, and keeping count
is unnecessary.
(If lispstr2 is shorter or equally long, this will test len2 < len2
which is false.)
rust_src/src/strings.rs
Outdated
@@ -1,6 +1,6 @@ | |||
//! Functions operating on strings. | |||
|
|||
use std::ptr; | |||
use std::{ptr, cmp}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unused import?
rust_src/src/lisp.rs
Outdated
@@ -155,6 +155,10 @@ impl LispObject { | |||
pub fn is_symbol(self) -> bool { | |||
self.get_type() == LispType::Lisp_Symbol | |||
} | |||
|
|||
pub fn symbol_name(&self) -> LispObject { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
…the current codepoint location, not the next offset location. Updating string-lessp logic to not need an explicit 'count' variable
…. This will allow us to create a LispSymbolRef like we have a LispStringRef. This will also allow a similar API for working with symbols as we have for strings.
rust_src/src/strings.rs
Outdated
fn get_string_or_symbol(mut string: LispObject) -> multibyte::LispStringRef { | ||
if string.is_symbol() { | ||
string = string.symbol_name() | ||
string = string.as_symbol_or_error().symbol_name() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This still duplicates the symbol check. I'd make it
match string.as_symbol() {
Some(sym) => sym.symbol_name().as_string().expect("symbol name not a string?")
None => string.as_string_or_error()
}
If you like, you can also make string_equal
use this function, it currently uses SYMBOL_NAME directly.
…ged Rust union for a SymbolUnion. Updating the implementation of 'get_string_or_symbol' to avoid an additional symbol check.
rust_src/src/lisp.rs
Outdated
#[inline] | ||
pub fn as_symbol(&self) -> Option<LispSymbolRef> { | ||
if self.is_symbol() { | ||
Some(LispSymbolRef::new(unsafe { mem::transmute(self.get_untaggedptr()) })) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks to be incorrect. The code in lisp.h for XSYMBOL looks like
INLINE struct Lisp_Symbol *
(XSYMBOL) (Lisp_Object a)
{
#if USE_LSB_TAG
return lisp_h_XSYMBOL (a);
#else
eassert (SYMBOLP (a));
intptr_t i = (intptr_t) XUNTAG (a, Lisp_Symbol);
void *p = (char *) lispsym + i;
return p;
#endif
}
and it looks like we will have to emulate this logic for getting the address to mem::transmute.
rust_src/remacs-sys/lib.rs
Outdated
// @TODO check the value of name post and pre transmutation, it seems that name is surviving but | ||
// may not be the correct value | ||
#[repr(C)] | ||
pub struct Lisp_Symbol { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gcmarkbit, redirect ...etc are missing ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this is my understanding of the situation re: the mentioned variables. If anyone spots any incorrect information please do not hesitate to correct me:
Lisp_Symbol has the following definition in lisp.h (I've stripped out the comments for clarity):
struct Lisp_Symbol
{
bool_bf gcmarkbit : 1;
ENUM_BF (symbol_redirect) redirect : 3;
ENUM_BF (symbol_trapped_write) trapped_write : 2;
unsigned interned : 2;
bool_bf declared_special : 1;
bool_bf pinned : 1;
Lisp_Object name;
union {
Lisp_Object value;
struct Lisp_Symbol *alias;
struct Lisp_Buffer_Local_Value *blv;
union Lisp_Fwd *fwd;
} val;
Lisp_Object function;
Lisp_Object plist;
struct Lisp_Symbol *next;
This struct is using the C notation for bit fields. According to my research, Rust does not (rust-lang/rfcs#314, https://users.rust-lang.org/t/c-structs-with-bit-fields-and-ffi/1429) support setting up an equivalent for C bitfieldsin it's #[repr(C)]
directive.
On my system (64-bit Ubuntu 16.04), no matter what the typedef of bf_bool or ENUM_BF, the bit field section of the struct takes 4 bytes. Due to alignment, the struct is padded, and offsetof(struct Lisp_Symbol, name) reports 8. My initial solution to representing this struct in Rust was to represent the bit field block as a u32, taking up the 4 bytes I mentioned earlier. I have not fully convinced myself that this is 100% the correct and portable thing to do.
Even if we can safely represent this block with a u32 on every system we support, it seems that there are 'gotcha's with accessing these bit fields via the bit wise operators (due to endianness, and compiler differences w.r.t bit field implementation.)
Overall I am not sure the best way to handle the interop for C structs that use bit fields that we need to access in Rust. It seems that one option is to simply pad the Rust struct as best we can, and if we need to access these fields, we will need to maintain C bindings that access them for us.
…guments. This was due to an improper call to mem::transmute in the Rust layer. Symbols are a special case in which you cannot just call mem::transmute(self.get_untaggedptr()). Instead, one must offset the pointer value based on the memory address of an emacs global 'lispsym'.
…Ref. This will allow a user to loop over the codepoints of a LispStringRef, or the indicies of the codepoints of a LispStringRef.
rust_src/src/multibyte.rs
Outdated
LispStringRefCharIterator(self.iter()) | ||
} | ||
|
||
pub fn char_indices(&self) -> LispStringRefIndexIterator { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry to nitpick again, but actually char_indices
returns the (index, char)
tuple - you just need to rename iter
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to apologize, this should be fixed now.
Overall, I feel pretty confident with the PR now. If any maintainers have additional feedback, I will be happy to address their concerns, otherwise I think this is ready to merge. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great to me: code looks clean :). Other than one missing docstring, I think it's good to merge.
rust_src/src/multibyte.rs
Outdated
|
||
pub struct LispStringRefCharIterator<'a>(LispStringRefIterator<'a>); | ||
|
||
// Substitue for FETCH_STRING_CHAR_ADVANCE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
*Substitute
rust_src/src/strings.rs
Outdated
@@ -202,6 +202,21 @@ fn string_to_unibyte(string: LispObject) -> LispObject { | |||
} | |||
} | |||
|
|||
#[lisp_fn] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please put the docstring here.
… in comment about LispStringRefIterator
Marvellous :) |
This is a WIP PR for porting string-lessp to Rust. As part of this work, it seemed useful to add an iterator for a LispStringRef, in order to replace the macro FETCH_STRING_CHAR_ADVANCE
Work left to do: