Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

macros for using factory calibration values #3

Open
spth opened this issue Jul 6, 2020 · 7 comments
Open

macros for using factory calibration values #3

spth opened this issue Jul 6, 2020 · 7 comments

Comments

@spth
Copy link

spth commented Jul 6, 2020

Now that SDCC generates efficient code for assignment from an address that is just a cast integer literal to a __sfr, IMO the asm macros for factory clibration aren't needed anymore.
Inline asm tends to interfere with optimizations, so for larger functions containing a use of factory calibration values, the assignment is likely to result in more efficient code being generated by SDCC.

@serisman
Copy link
Collaborator

serisman commented Jul 6, 2020

It seems like even with the latest SDCC nightly snapshot (v4.0.2 #11715) it still generates much less efficient code:

With the current code:

  #define PDK_USE_FACTORY_IHRCR_16MHZ() \
    __asm__( \
      "call #("_STR(FACTORY_IHRCR_ADDR)")   \n" \
      "mov "_STR_VAR(IHRCR)",a              \n" \
    )

... the following assembly is generated (2 words, 5 cycles):

	call	#(0x07ed)   ; 2 cycles + 2 cycles for ret I
	mov	__ihrcr,a      ; 1 cycle

With the new code:

  #define FACTORY_IHRCR (*((const unsigned char*)(FACTORY_IHRCR_ADDR)))
  #define PDK_USE_FACTORY_IHRCR_16MHZ()     IHRCR = FACTORY_IHRCR

... it still generates this less efficient (larger/slower) code (5 words + 12 words for __gptrget, 6 cycles + 15 cycles for __gptrget):

	mov	a, #0xed     ; 1 cycle
	mov	p, a             ; 1 cycle
	mov	a, #0x07     ; 1 cycle
	call	__gptrget   ; 2 cycles
	mov	__ihrcr, a    ; 1 cycle

__gptrget:
	sub	a, #0x80    ; 1 cycle
	t1sn	f, c             ; 1 cycle (skip not taken)
	  goto	code  ; 2 cycles
	idxm	a, p    ; (not executed)
	ret                     ; (not executed)
code:
	xch	a, p            ; 1 cycle
	push	af       ; 1 cycle
	mov	a, sp           ; 1 cycle
	add	a, #-1         ; 1 cycle
	xch	a, p            ; 1 cycle
	idxm	p, a    ; 2 cycles
	ret                     ; 2 cycles + 2 cycles for ret I (this jumps to the address pointed at in code which contains a ret I)

Were you expecting something more efficient to be generated?

Granted, depending on the program, the 12 word __gptrget method may already be included, but even so, it is still larger/slower than the existing code.

The most likely place for this factory calibration code to be used is the _sdcc_external_startup method, which should really be kept to the bare minimum anyway.

@serisman
Copy link
Collaborator

serisman commented Jul 6, 2020

Ok, I may have found a better solution:

This code:

  typedef unsigned char (*funcptr)(void);
  #define GET_FACTORY_IHRCR ((funcptr)FACTORY_IHRCR_ADDR)
  #define PDK_USE_FACTORY_IHRCR_16MHZ()     IHRCR = GET_FACTORY_IHRCR()

... generates the same efficient assembly (2 words, 5 cycles):

	call	0x07ED     ; 2 cycles + 2 cycles for ret I
	mov	__ihrcr, a   ; 1 cycle

@serisman
Copy link
Collaborator

serisman commented Jul 6, 2020

Fixed with: 451f449

@spth
Copy link
Author

spth commented Jul 11, 2020

Ok, I may have found a better solution:

This code:

  typedef unsigned char (*funcptr)(void);
  #define GET_FACTORY_IHRCR ((funcptr)FACTORY_IHRCR_ADDR)
  #define PDK_USE_FACTORY_IHRCR_16MHZ()     IHRCR = GET_FACTORY_IHRCR()

... generates the same efficient assembly (2 words, 5 cycles):

	call	0x07ED     ; 2 cycles + 2 cycles for ret I
	mov	__ihrcr, a   ; 1 cycle

But SDCC would have to assume that the called function might change p or a flag. SDCC could thus generate better code when it knows it is a read from code memory that would not do such things.

@spth spth reopened this Jul 11, 2020
@spth
Copy link
Author

spth commented Jul 11, 2020

SDCC can generate the best code for simple assignment:

#define FACTORY_IHRCR (*(const unsigned char *)0x87ed)
__sfr __at(0x0b) IHRCR;

void f(void)
{
	IHRCR = FACTORY_IHRCR;
}

Will give the same code as your function pointer use above, but now SDCC can keep p in use at the point of the read (since it knows there will be a ret k, which does not change registers, at the called location).

@serisman
Copy link
Collaborator

Ok, I may have found a better solution:
This code:

  typedef unsigned char (*funcptr)(void);
  #define GET_FACTORY_IHRCR ((funcptr)FACTORY_IHRCR_ADDR)
  #define PDK_USE_FACTORY_IHRCR_16MHZ()     IHRCR = GET_FACTORY_IHRCR()

... generates the same efficient assembly (2 words, 5 cycles):

	call	0x07ED     ; 2 cycles + 2 cycles for ret I
	mov	__ihrcr, a   ; 1 cycle

But SDCC would have to assume that the called function might change p or a flag. SDCC could thus generate better code when it knows it is a read from code memory that would not do such things.

Well, the called function does change a, so that pars is ok. It doesn't change p, but because the normal place to initialize this code (i.e. _sdcc_external_startup method) is generally kept so minimal, SDCC doesn't (usually) generate inefficient code for this.

@serisman
Copy link
Collaborator

SDCC can generate the best code for simple assignment:

#define FACTORY_IHRCR (*(const unsigned char *)0x87ed)
__sfr __at(0x0b) IHRCR;

void f(void)
{
	IHRCR = FACTORY_IHRCR;
}

Will give the same code as your function pointer use above, but now SDCC can keep p in use at the point of the read (since it knows there will be a ret k, which does not change registers, at the called location).

I still can't get this to generate good code.

Your specific example does generate efficient code, but the address isn't correct. I'm not sure where 0x87ed comes from, but SDCC changes that to 0x07ed for some reason and generates this:

	call	#0x07ed
	mov	__ihrcr, a

That might work for the PFS154, but not for other devices.

Whey I try to replace the 0x87ed with a real address (i.e. 0x0bed), SDCC reverts back to using the inefficient __gptrget code.

Can you explain where the 0x87ed comes from, and why SDCC is converting that to 0x07ed, and why using a real address doesn't generate the efficient code?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants