url shortening algo

SeppJ

Your algorithm will not shorten anything. It just makes a numerical representation out of a string and vice versa. In fact, the number (when written in base 10) will be longer, as you only have log_2(10) bits per character. That is also the reason, why even an unsigned long long will overflow. The numbers you are handling here are just way too large.

what does that mean in terms of code?
take the ID in the database of the URL and then convert it to either Base 36 [a-z0-9] (case insensitive) or Base 62 (case sensitive).

hustbaer

allen schrieb:

what does that mean in terms of code?
take the ID in the database of the URL and then convert it to either Base 36 [a-z0-9] (case insensitive) or Base 62 (case sensitive).

Well, using a database would be a start.
Getting a basic understanding of programming would also help a lot.

base32 is what this function should do... the id would come from the database... would that work?

#include <iostream>
#include <vector>
#include <string>
#include <algorithm>
using namespace std;

string decode(unsigned long long id) {
vector<char> sb;
vector<char> alphabet = {'0','1','2','3','4','5','6','7','8','9','a','b','c','d','e','f','g','h','j','k','m','n','p','q','r','s','t','v','w','x','y','z'};
size_t base = alphabet.size();

while (id > 0) {
size_t index = id % base;

sb.push_back(alphabet[index]);
id /= base;
}

string result(sb.cbegin(), sb.cend());

reverse_copy(sb.begin(), sb.end(), result.begin());

return result;
}

int main() {
// your code goes here

cout << decode(123) << endl;

return 0;
}

allen schrieb:

base32 is what this function should do... the id would come from the database... would that work?

Yes, assuming the id is nonzero.

Here's a cleaner version of your code:

#include <iostream>
#include <string>
#include <algorithm>
#include <cassert>
using namespace std;

string decode(unsigned long long id) {
  if (id == 0) return "0";

  const char alphabet[] = "0123456789abcdefghjkmnpqrstvwxyz";
  const size_t base = 32;
  assert(end(alphabet) - begin(alphabet) - 1 == base);

  string code;
  while (id > 0) {
    code.push_back(alphabet[id % base]);
    id /= base;
  }
  reverse(begin(code), end(code));
  return code;
}

int main() {
  cout << decode(123) << '\n';
}

SeppJ

allen schrieb:

base32 is what this function should do... the id would come from the database... would that work?

It does base36.

SeppJ schrieb:

allen schrieb:

base32 is what this function should do... the id would come from the database... would that work?

It does base36.

Wow, I didn't notice that he left out the letters 'i', 'l', 'o' and 'u' in his alphabet array. What he does is neither base32 nor base36.

SeppJ

during schrieb:

SeppJ schrieb:

allen schrieb:

base32 is what this function should do... the id would come from the database... would that work?

It does base36.

Wow, I didn't notice that he left out the letters 'i', 'l', 'o' and 'u' in his alphabet array. What he does is neither base32 nor base36.

Oops, me neither. So it is neither base36, nor what is commonly known as base32. It appears to be Crockford's Base32, but without checksums.

Wikipedia schrieb:

It also excludes the letter U to reduce the likelihood of accidental obscenity.

i missed out some characters...the following should fix the base32?

vector<char> alphabet = {'0','1','2','3','4','5','a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x','y','z'};

SeppJ

Does the exact encoding matter? You seem to understand the important principles, the choice of alphabet is just details.

What happens, if the ID is 0 or the last digit in base_whatever is 0 ?