algorithm: find whether the array contains a string with one character difference from the given string

hi,

hier eine keine algorithm frage:
Given a string and array of strings, find whether the array contains a
string with one character difference from the given string. Array may
contain string of different lengths.
bitte um feedback zu meiner loesung. bitte um andere loesungen welche eventuell besser sind.

#include <iostream>
#include <string>
#include <vector>
using namespace std;

const int alphabet_size = 26;

class node {
public:
	char key;
	node *next[alphabet_size];
	bool is_end;

	node() {
		for (int i = 0; i < alphabet_size; i++) {
			next[i] = NULL;
		}
		is_end = false;
	}
};

class trie {
private:
	node *root;

public:
	trie() {
		root = new node;
	}

	void insert(const string &key) {
		node *tmp = root;

		for (int i = 0; i < key.size(); i++) {
			char k = key[i] - 'a';

			if (!tmp->next[k]) {
				tmp->next[k] = new node;
				tmp->next[k]->key = key[i];
			}

			tmp = tmp->next[k];
			tmp->is_end = false;
		}

		tmp->is_end = true;
	}

	int count_matches(const string &s) {
		int count = 0;
		node *tmp = root;

		for (int i = 0; i < s.size(); i++) {
			char k = s[i] - 'a';

			if (tmp->next[k]) {
				count++;
				tmp = tmp->next[k];
			}
			else {
			    // curr. char is not a match, next char needs to be a match
			    // as long as the index is in s is at least total_index - 2
			    if (i < (s.size() - 1)) {
				    char next_k = s[i+1] - 'a';

				    // check if the next char is a match
				    for(unsigned int j = 0; j < alphabet_size; j++) {
					    if (tmp->next[j] && 
					        tmp->next[j]->next[next_k]) {
						    tmp = tmp->next[j];
						    break;
					    }
				    }
			    }
			    else {
			        break;
			    }
			}

			if (tmp->is_end) {
			    break;
			}
		}

		return count;
	}

	bool is_one_char_different(const string &s) {
		return (count_matches(s) == (s.size() - 1));
	}
};

int main() {
	/* Ex: Given string    
	banana
	and array is  [bana, apple, banaba, bonanza, banamf]
	and the output should be true as banana and banaba are one character difference.
	*/

	trie t;

	vector<string> vec = {"bana", "apple", "banaba", "bonanza", "banamf"};
	for (auto &e: vec) {
		t.insert(e);
	}

	cout << t.is_one_char_different("banana") << endl;

	return 0;
}

Jockelx

Zumindest könntest du deine 'trie'-Klasse in 'Tree' umbennen.

KN4CK3R

Trie ist richtig, auch wenns komisch aussieht.

Ich sehe new aber kein delete => Speicherleck! Abhilfe: Smartpointer.

SeppJ

Regel der großen Drei/Fünf nicht beachtet. Das kommt davon, wenn man denkt, selber Speicher verwalten zu müssen.

@manni66: stimmt. trie delete ist noch nicht implementiert.

sonst comments zu algo?

SeppJ

Ich bin irgendwie nicht so glücklich damit, dass du deinen Algorithmus ohne Not auf Kleinbuchstaben beschränkst. Lass doch einfach alle möglichen Zeichen zu! Derzeit baut dein Programm totalen Mist, wenn ein nicht-Kleinbuchstabe kommt. Es wäre ja sogar einfacher, wenn du diese Beschränkung einfach weg ließest.

volkard

algoboy schrieb:

sonst comments zu algo?

Sieht für mich ganz falsch aus. Hab ein ungutes Gefühl, wie das erste Zeichen platt als Treffer verbucht wird, da könnte "ba" das "panana" verdecken. Hab ein ungutes Gefühl, wie bei der ersten Möglichkeit des Überspringens diese und nur diese benutzt wird, da könnte "benan" ein "banena" verdecken.

Auf jeden Fall musst Du wegkommen vom Erstellen von Testfällen per Hand. Bau eine lahme narrensicherere Referenzimplementierung und erzeuge in der main zufällige Strings und lass Deine Implementierung 1000000 mal gegen die Referenzimplementierung antreten. Sorge per Feintuning (min_wordlength, max_wordlength, vec_size) dafür, daß nicht fast alle Testfälle true und nicht fast alle Testfälle false ergeben.

Jockelx

KN4CK3R schrieb:

Trie ist richtig, auch wenns komisch aussieht.

Oh, wieder was gelernt. Hab ich noch nie gehört und sah einfach nur falsch aus.

5cript

Ich hätte ja Levenshtein Distanz implementiert:

https://en.wikipedia.org/wiki/Levenshtein_distance
oder auch
https://en.wikibooks.org/wiki/Algorithm_Implementation/Strings/Levenshtein_distance#C.2B.2B

Was musst du optimieren? Wie schnell darf das Preprocessing sein und wie schnell das Query? Es macht einen Unterschied ob es nur ein Query gibt oder mehrere.

Levenshtein Distanz wuerde das problem auch loesen jedoch ist die komplexitaet O(n*m) ...