Download program and sources in zip archive: dfcfr11.zip
dfcfr: VIT(R) Duplicate File Converter/Finder/Remover Version 1.1 Copyright (C) Vitaliy I. Vasiliev 17.10.2000 Vitaliy Vasiliev homepage: http://www.chat.ru/~vitaliy_vasiliev/ Vitaliy Vasiliev homepage mirror: http://free.prohosting.com/~vitivas/ USAGE: dfcfr [switches] switches: -v verbose -partCrc:0x??????? beg part of file to get CRC32 (default first 4096 Bytes) -fileLog:<fileLog> output LOG file (default "fileLog.txt") -convLog:<convLog> converter's output LOG file (default "convLog.txt") -rprtTxt:<rprtTxt> output report file (default "report.txt") -saveLst:<saveLst> save sorted list of files to <saveLst>(default not save) -baseDir:<baseDir> set base directory (default ".") -tempDir:<tempDir> set base TEMP directory (defaut "TEMP") -convCmd:<convCmd> set program to convert files (must support: arg1=input_file, arg2=output_file) -bckpDir:<bckpDir> set directory for backups (default "#backup$") -bckpDel delete backups immediately after convert with <convCmd> -dupsDel move duplicate files to <tempDir> (default: only report) This package is distributed as freeware. Description on russian (charset Cp866). Программа предназначена для поиска и автоматического удаления файлов с одинаковым содержимым, а также для выявления версий полностью и не полностью записанных файлов. Программа может быть очень полезна для ведения какой-либо коллекции файлов (картинки, музыкальные, текстовые файлы и так далее). Чтобы изучить возможности, лучше понять принципы работы программы. Сначала рекурсивно (с обходом поддиректорий) сканируется заданная базовая директория и в памяти составляется список файлов (директории в этот список не попадают). Базовую директорию можно задать в командной строке, например, "-baseDir:C:\Program Files", по умолчанию базовая директория является текущей (как если задать -baseDir:.). Затем список файлов сортируется. Далее программа считывает содержимое всех файлов и подсчитывает CRC32 от содержимого файлов. Вернее, загружается только первый небольшой кусочек файла (например, 4096 байт, но можно задать и другое значение, например, "-partCrc:8192" или даже "-partCrc:0xFFFFFF" - чтобы CRC подсчитывалось от всего файла). Брать CRC32 только от небольшой начальной части файла хорошо и с точки зрения более быстрого сбора CRC-контрольных сумм файлов, и с целью поиска неполных (например, недокаченных) файлов. Если задана опция "-convCmd:[conversion command]", то перед считыванием файла для взятия его CRC будет запущена указанная команда для обработки файла. Если программа-конвертер ничего не записывает в выходной файл, то dfcfr восстанавливает файл обратно из резервной копии текущего обрабатываемого файла - dfcfr предварительно делает резервную копию в директорию для бэкапов. Эту директорию можно задать опцией -bckpDir (например, "-bckpDir:C:\Backups to del"). В качестве программы-конвертера для примера в поставку dfcfr включена программа jpgopt v1.3. Вполне полезная, немного хакерская тулза, тоже свободно распространяемая с исходными текстами, и автор также не несёт обязательств по поддержке, не несёт никакой ответственности за надёжность, работоспособность, возможность испортить файлы и компьютер при использовании программы. JPGOPT не изменяет в JPEG-файлах графические данные, а только вырезает сор из JPG-файлов: установки принтера, сканера, preview-картинака (например, preview-хи, сохранённые PHOTOSHOP-ом может показывать только он сам и только в окне "Open File"). Оптимизация jpgopt-ом особенно актуальна для мелких jpg-файлов - часто из 4 KB jpeg-файл становится размером 2 KB. Итак, DFCFR собрал все CRC32 hash-и всех файлов. Теперь нужно искать файлы с одинаковыми значениями CRC и сравнивать их содержимое. Для этого сортируется массив CRC32-значений, затем код идёт по этому сортированному списку и специальной функции передаются списки имён файлов, у которых одинаковые CRC32. Там, попарно сравнивая содержимое файлов, удаляются в [tempDir] (если задано "-dupsDel") дупы и неполные файлы, делаются записи в LOG-файл и в "report.txt". Отдельно следует рассказать про принципы выбора того, какие файлы будут удаляться. Если найдены файлы с одинаковым общим содержимым, но разной длины, то удаляется более короткий файл и в "report.txt" записывается "Eq1:" или "Eq2:". Например, так: > Eq1: "files/first10KB" "files/only5KB" > Eq2: "files/first10KB" "files/fully45KB" При первом сравнении обнаружилось, что "first10KB" длиннее, чем "only5KB", удалился файл "only5KB". На втором сравнении оказалось, что второй файл из сравниваемой пары длиннее (поэтому Eq2): "fully45KB" длиннее, чем "first10KB", поэтому удалился файл "first10KB". Если файлы различные, то в "report.txt" записывается > Eq-: "files/changed1ByteAt10KB" "files/fullly45KB" И ничего не удаляется. Вообще говоря, это означает, что в первых 4 KB совпадает CRC32 - это верный признак того, что файлы очень похожи, возможно, один из этих файлов побился при копировании и у него некорректное содержимое, либо просто один из файлов является немного изменённым (различия после первых 4 KB). Если файлы одинаковые, то удаляется файл, который расположен в сортированном (без учёта регистра) списке дальше. Например, список: > coll01F\fullly45KB > LIST01\fullly45KB > abba45KB > fullly45KB > Fullly45KB_ > Zine45KB Будет отсортирован с учётом регистра в такой список: > Fullly45KB_ > LIST01/fullly45KB > Zine45KB > abba45KB > coll01F/fullly45KB > fullly45KB И в "report.txt" записывается: > Eq : "Fullly45KB_" "fullly45KB" > Eq : "Fullly45KB_" "coll01F/fullly45KB" > Eq : "Fullly45KB_" "abba45KB" > Eq : "Fullly45KB_" "Zine45KB" > Eq : "Fullly45KB_" "LIST01/fullly45KB" Оставляется только "Fullly45KB_", а в директории TEMP оказываются остальные (конечно, если задана опция "-dupsDel"): > TEMP/Zine45KB > TEMP/abba45KB > TEMP/coll01F/fullly45KB > TEMP/fileLog.txt > TEMP/fullly45KB Можно проверять, что получается, и, если нормально, то удалять "TEMP". В общем, более качественные коллекции лучше поместить в директориях с именами ближе к началу списка. Например, уже отобранные ранее и сгруппированные файлы можно записать в директорию "CD1", частично просмотренные файлы записать в "CD2", а новые "неразгребённые завалы" поступлений для коллекции - в директорию "CD3", тогда максимум повторяющихся файлов будет удалено из директории "CD3". HISTORY. v1.0 15.10.2000 + first version v1.1 17.10.2000 + some fixes, enhances, command line options + support error open file (now possible search dups in windows's DIR) + added ReadMeRU.txt // dfcfr.cpp // dfcfr: VIT(R) Duplicate File Converter/Finder/Remover // Copyright (C) Vitaliy I. Vasiliev // Vitaliy Vasiliev homepage: http://www.chat.ru/~vitaliy_vasiliev/ // Vitaliy Vasiliev homepage mirror: http://free.prohosting.com/~vitivas/ #define MAXPATH_len 10240 #ifdef _MBCS // sets under MsVC DevStudio #define MSVC #endif #include <sys/types.h> #include <sys/stat.h> #include <stdio.h> #include <math.h> #include <time.h> #ifdef WIN32 #define SWITCHCHAR0 '/' #define SWITCHCHAR1 '-' #define PATHDELIM '\\' #define StringPATHDELIM "\\" #include <direct.h> #include <string.h> // for memset() under CygWin #ifdef _GNU_SOURCE // WIN32 + CygWin: #define Win32_Winsock 1 #include <Windows32/Sockets.h> #include <windows.h> #include <stdlib.h> #ifdef __CYGWIN32__ #undef __CYGWIN32__ #endif #include <unistd.h> // contains unlink(char* c), but "unistd.h" absent in MSVC #include <malloc.h> //#include <winsock.h> #include <process.h> // under CygWin #include <dirent.h> #else // WIN32 + MSVC or WC: #ifdef MSVC #include <afxwin.h> // for MessageBox() under MS Visual C needs <afxwin.h> #include <afxinet.h> #else #include <windows.h> // for WATCOM #endif #include <process.h> // "#define _MT" not necessary if using MFC #include <mapiwin.h> #include <direct.h> #include <io.h> #include <winsock.h> #endif #else #define HAVE_DIRENT #define SWITCHCHAR0 '-' #define SWITCHCHAR1 '-' // unix, gcc: HAVE_DIRENT, HAVE_PTHREAD: #define PATHDELIM '/' #define StringPATHDELIM "/" #include <sys/timeb.h> #include <sys/socket.h> #include <netinet/in.h> #include <utime.h> // utime(...) - Set the access and modification times of FILE #include <netdb.h> #include <stdlib.h> #include <string.h> #include <unistd.h> #include <pthread.h> #include <unistd.h> // contains unlink(char* c), but "unistd.h" absent in MSVC #include <dirent.h> #endif #define Def_allocErrMsg "malloc(): Can't alloc. %d*%d bytes memory: ErrorDescAt: %s" #define Def_deleteErrMsg "free(): Attempt to delete NULL pointer" #define Mac_new(ptr,type,size,errMsg) ptr = (type*)malloc((size)*sizeof(type)); \ if(ptr==NULL) showError(allocErrMsg, sizeof(type), size, errMsg) #define Mac_newAddCount(ptr,type,size,errMsg) ptr = (type*)malloc((size)*sizeof(type)); \ if(ptr==NULL) showError(allocErrMsg, sizeof(type), size, errMsg); #define Mac_delete(ptr) if(ptr!=NULL) free(ptr); \ else showError(deleteErrMsg) char *allocErrMsg = Def_allocErrMsg; char *deleteErrMsg = Def_deleteErrMsg; int tzSecondsShift; unsigned long partCrc=0; unsigned long numOfDupFiles=0; int flag_toStopAndQuit=0; int bckpDel=0; int dupsDel=0; #ifdef WIN32 char *fileLog = "NUL"; #else char *fileLog = "/dev/null"; #endif char *rprtTxt = NULL; char *convLog = NULL; char *convCmd = NULL; char *tempDir = NULL; char *bckpDir = NULL; char *saveLst = NULL; typedef struct arrLongsToCopyPrintfArgs { unsigned long l0; unsigned long l1; unsigned long l2; unsigned long l3; unsigned long l4; unsigned long l5; unsigned long l6; unsigned long l7; unsigned long l8; unsigned long l9; unsigned long lA; unsigned long lB; unsigned long lC; unsigned long lD; unsigned long lE; unsigned long lF; } ARRLONGS16; void showMsg(char *sz,...); #define DEBUG_LEVEL 5 #if (DEBUG_LEVEL >= 0) # define Mac_logPr0(x) showMsg x #else # define Mac_logPr0(x) #endif #if (DEBUG_LEVEL >= 1) # define Mac_logPr1(x) showMsg x #else # define Mac_logPr1(x) #endif #if (DEBUG_LEVEL >= 2) # define Mac_logPr2(x) showMsg x #else # define Mac_logPr2(x) #endif #if (DEBUG_LEVEL >= 3) # define Mac_logPr3(x) showMsg x #else # define Mac_logPr3(x) #endif #if (DEBUG_LEVEL >= 4) # define Mac_logPr4(x) showMsg x #else # define Mac_logPr4(x) #endif #if (DEBUG_LEVEL >= 5) # define Mac_logPr5(x) showMsg x #else # define Mac_logPr5(x) #endif #if (DEBUG_LEVEL >= 6) # define Mac_logPr6(x) showMsg x #else # define Mac_logPr6(x) #endif #if (DEBUG_LEVEL >= 7) # define Mac_logPr7(x) showMsg x #else # define Mac_logPr7(x) #endif #if (DEBUG_LEVEL >= 8) # define Mac_logPr8(x) showMsg x #else # define Mac_logPr8(x) #endif #if (DEBUG_LEVEL >= 9) # define Mac_logPr9(x) showMsg x #else # define Mac_logPr9(x) #endif typedef unsigned long UCRC; #define CRC_MASK 0xFFFFFFFFUL #define UPDATE_CRC(crc, c) \ crc = crcTable[(unsigned char)crc ^ (unsigned char)(c)] ^ (crc>>8) UCRC crcTable[256]; #ifdef HAVE_DIRENT void *dirReader_dir; void *dirReader_rec; int lastRetFindNextFile; #else void *dirReader_WIN32_FIND_DATA; void *dirReader_handle; int lastRetFindNextFile; #endif int optionVerbose = 0; //int optionQuiet = 0; unsigned char inHex[256] = { /*00*/ 255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255, /*10*/ 255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255, /*20*/ 255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255, /*30*/ 0 , 1 , 2 , 3 , 4 , 5 , 6 , 7 , 8 , 9 ,255,255,255,255,255,255, /*40*/ 255,10 ,11 ,12 ,13 ,14 ,15 ,255,255,255,255,255,255,255,255,255, /*50*/ 255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255, /*60*/ 255,10 ,11 ,12 ,13 ,14 ,15 ,255,255,255,255,255,255,255,255,255, /*70*/ 255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255, /*80*/ 255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255, /*90*/ 255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255, /*A0*/ 255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255, /*B0*/ 255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255, /*C0*/ 255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255, /*D0*/ 255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255, /*E0*/ 255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255, /*F0*/ 255,255,255,255,255,255,255,255,255,255,255,255,255,255,255,255 }; void waitPressEnter(void) { char itmp[MAXPATH_len]; printf("\rPress ENTER to continue..."); fflush(stdout); fread(itmp, 1, 1, stdin); printf("\r \r"); } void logoAndVersion(void) { printf("\n" "dfcfr: VIT(R) Duplicate File Converter/Finder/Remover\n" " Version 1.1 Copyright (C) Vitaliy I. Vasiliev 17.10.2000\n" " Vitaliy Vasiliev homepage: http://www.chat.ru/~vitaliy_vasiliev/\n" " Vitaliy Vasiliev homepage mirror: http://free.prohosting.com/~vitivas/\n" ); } void usage(void) { logoAndVersion(); printf("\n" "USAGE: dfcfr [switches]\n" " switches:\n" " -v verbose mode output\n" " -partCrc:0x??????? beg part of file to get CRC32 (default first 4096 Bytes)\n" " -fileLog:<fileLog> output LOG file (default \"fileLog.txt\")\n" " -convLog:<convLog> converter's output LOG file (default \"convLog.txt\")\n" " -rprtTxt:<rprtTxt> output report file (default \"report.txt\")\n" " -saveLst:<saveLst> save sorted list of files to <saveLst>(default not save)\n" " -baseDir:<baseDir> set base directory (default \".\")\n" " -tempDir:<tempDir> set base TEMP directory (defaut \"TEMP\")\n" " -convCmd:<convCmd> set program to convert files (must support:\n" " arg1=input_file, arg2=output_file)\n" " -bckpDir:<bckpDir> set directory for backups (default \"#backup$\")\n" " -bckpDel delete backups immediately after convert with <convCmd>\n" " -dupsDel move duplicate files to <tempDir> (default: only report)\n" ); exit(127); } unsigned long getArgVal(char* ptrSrc) { char *ptrStart; unsigned long val = 0; ptrStart = ptrSrc; if(*ptrSrc == 0) { printf("\nError in arguments: constant value expected.\n"); exit(-1); } if((*(unsigned short *)ptrSrc != '0' + 256 * 'x') && (*ptrSrc != '$')) { while(*ptrSrc >= '0' && *ptrSrc <= '9') { val *= 10; val += (unsigned long)(*(ptrSrc++) - '0'); } } else { // if beg "0x" or "$" ==> hexadecimal ptrSrc++; if(*ptrSrc == 'x') ptrSrc++; ptrSrc--; while(inHex[*++ptrSrc] < 0x10) { val *= 0x10; val += (unsigned long)(inHex[*ptrSrc]); } } if(*ptrSrc != 0) { printf("\ninvalid constant value: '%s'\n",ptrStart); exit(-1); } return(val); } unsigned long getArg(char **retOutStr, char *argv[], char *argN, char *dflt) { //int optionVerbose = 1; // if(retOutStr == NULL) get argument/switch value // 'argN' is argument number ("1", "2",..."99") or switch name ("-q", "-b") // 'dflt' is default returning value - use NULL if value must be defined // in command line // will be returned pointer: // for arg - pointer to arg or 'dflt' if arg not def. in // command line and dflt!=NULL // for arg=value - pointer to "value" or pointer to 'dflt' // for arg:value - pointer to "value" or pointer to 'dflt' // for swith as -q[Y|N] - pointer to cahr after "-q" or 'dflt' if switch // not def. in command line // for swith as -b1234 - pointer to "1234" or pointer to 'dflt' // for swith as -b=1234 - pointer to "1234" or pointer to 'dflt' // for swith as -b 1234 - pointer to "1234" (next non-switch command line // arg) or pointer to 'dflt' unsigned long retVal = 0; char *outPtr = dflt; int a = 0; int i = 0; int an = -1; if(argN[0] == '-') { // if switch for(an = 0; argN[an + 1] != 0; an++) ; for(a = 1; argv[a] != NULL; a++) { // for every command line arg if(argv[a][0] == SWITCHCHAR0 || argv[a][0] == SWITCHCHAR1) { for(i = 0; i < an; i++) if(argv[a][i + 1] != argN[i + 1]) break; if(i != an) continue; break; } else { // if argument - may be "switch=value" - checking: for(i = 0; i < an; i++) if(argv[a][i] != argN[i + 1]) break; if(i != an) continue; if(argv[a][i] != ':') continue; if(argv[a][i] != '=') continue; Mac_logPr4(("switch=value: argv[a]=%s\n", argv[a])); //pause(); continue; } } if(argv[a] != NULL) { i++; if(argv[a][i] == ':') i++; if(argv[a][i] == '=') i++; outPtr = &argv[a][i]; if(retOutStr == NULL && argv[a][0] == '-' && (argv[a][1] == 'q' || argv[a][1] == 'Q') && argv[a][2] == 0) return 1; if(optionVerbose) Mac_logPr4(("accept CmdLn switch %s", argN)); if(retOutStr == NULL && dflt[1] == 0 && dflt[0] == '+') { if(optionVerbose) Mac_logPr4(("\n")); return 1; } if(retOutStr == NULL && dflt[1] == 0 && dflt[0] == '-') { if(optionVerbose) Mac_logPr4(("\n")); return 0; } } else { if(retOutStr == NULL && dflt[1] == 0 && dflt[0] == '+') return 0; if(retOutStr == NULL && dflt[1] == 0 && dflt[0] == '-') return 1; if(optionVerbose) Mac_logPr4(("absent CmdLn switch %s", argN)); if(outPtr == NULL) { Mac_logPr3(("\n Error in command line arguments: must be defined argunment %s.\n", argN)); usage(); // and exit(); } else { if(optionVerbose) Mac_logPr4((": using default")); } } if(retOutStr == NULL) { retVal = getArgVal(outPtr); if(optionVerbose) Mac_logPr4((": val %ld (0x%lX) parsed \"%s\"\n", retVal, retVal, outPtr)); return retVal; } //i=0; while((retOutStr[i] = outPtr[i]) != 0) i++; //strcpy(retOutStr, outPtr); *retOutStr = outPtr; // place pointer to return string if(optionVerbose) Mac_logPr4((": str \"%s\"\n", *retOutStr)); return 0; } else { // find (an)-argument an = getArgVal(argN); for(a = 1; argv[a] != NULL; a++) { // for every command line arg if(argv[a][0] == SWITCHCHAR0) continue; // skip switch if(argv[a][0] == SWITCHCHAR1) continue; // skip switch if(--an == 0) break; } if(an == 0) { outPtr = argv[a]; if(optionVerbose) Mac_logPr4(("accept CmdLn argument %s", argN)); } else { if(optionVerbose) Mac_logPr4(("absent CmdLn argument %s", argN)); if(outPtr == NULL) { Mac_logPr4(("\n Error in command line arguments: must be defined argunment %s.\n", argN)); usage(); // and exit(); } else { if(optionVerbose) Mac_logPr4((": using default")); } } if(retOutStr == NULL) { retVal = getArgVal(outPtr); if(optionVerbose) Mac_logPr4((": val %ld (0x%lX) parsed \"%s\"\n", retVal, retVal, outPtr)); return retVal; } //i=0; while((retOutStr[i] = outPtr[i]) != 0) i++; //strcpy(retOutStr, outPtr); *retOutStr = outPtr; // place pointer to return string if(optionVerbose) Mac_logPr4((": str \"%s\"\n", *retOutStr)); return (unsigned long)-1; } } char *readFile(char *fileName, unsigned long *retLen) { FILE *fileHandle; char *retContent; if((fileHandle = fopen(fileName, "rb")) == NULL) { printf("Warning: can't open for read file: '%s'\n", fileName); return NULL; } fseek(fileHandle, 0L, SEEK_END); *retLen = ftell(fileHandle); fseek(fileHandle, 0L, SEEK_SET); retContent = (char*)malloc(*retLen); if(retContent==NULL) { printf("can't malloc(%d)", *retLen); fclose(fileHandle); return NULL; } if((fread(retContent, *retLen, 1, fileHandle)) != 1) { printf("Warning: can't read from file: '%s' %d bytes\n", fileName, *retLen); delete retContent; retContent = NULL; } fclose(fileHandle); return retContent; } void writeToFile(char *fileName, void *writeBuf, unsigned long len) { FILE *fileHandle; if((fileHandle = fopen(fileName, "wb")) == NULL) { printf("Warning: can't create file: '%s'\n", fileName); return; } if((fwrite(writeBuf, len, 1, fileHandle)) != 1) { printf("Warning: can't write to file: '%s' 0 bytes\n", fileName, len); } fclose(fileHandle); } void showError(char *sz,...) { #ifdef WIN32 unsigned long ulng; LPVOID lpMsgBuf; #endif printf("Error: "); printf(sz, *(ARRLONGS16*)(&sz+1)); printf("\n"); #ifdef WIN32 ulng = GetLastError(); FormatMessage( FORMAT_MESSAGE_ALLOCATE_BUFFER | FORMAT_MESSAGE_FROM_SYSTEM, NULL, ulng, MAKELANGID(LANG_NEUTRAL, SUBLANG_DEFAULT), // Default language (LPTSTR) &lpMsgBuf, 0, NULL); // Display the string: printf("showError(): GetLastError()=%d\nMsg='%s'\n", ulng, lpMsgBuf); // Free the buffer: LocalFree(lpMsgBuf); #endif //waitPressEnter(); } void replaceChars(char charSearch, char charReplaceTo, char *buf, unsigned long bufLen) { while(bufLen--) { if(*buf==charSearch) *buf=charReplaceTo; buf++; } } void makeDir(char *dirName) { #ifdef _GNU_SOURCE mkdir(dirName, 0755); #else mkdir(dirName); #endif } void alert(char *sz,...) { int i = 0; char *stringBuf; printf(sz, *(ARRLONGS16*)(&sz+1)); #ifndef _GNU_SOURCE while(sz[i++]) ; MessageBeep(MB_ICONHAND); Mac_new(stringBuf, char, (i+MAXPATH_len), "alert() 1"); sprintf(stringBuf, sz, *(ARRLONGS16*)(&sz+1)); MessageBox(NULL, stringBuf, "alert(char *sz,...)", MB_OK|MB_ICONINFORMATION); Mac_delete(stringBuf); #endif } void time_t_to_tm(unsigned long t, struct tm *plg_tmStruc) { // struct tm { // from <time.h> // int tm_sec; // seconds after the minute -- [0,59] // int tm_min; // minutes after the hour -- [0,59] // int tm_hour; // hours after midnight -- [0,23] // int tm_mday; // day of the month -- [1,31] // int tm_mon; // months since January -- [0,11] // int tm_year; // years since 1900 // int tm_wday; // days since Sunday -- [0,6] // int tm_yday; // days since January 1 -- [0,365] // int tm_isdst; // Daylight Savings Time flag */ // }; unsigned long tt; plg_tmStruc->tm_sec = t%60; t/=60; plg_tmStruc->tm_min = t%60; t/=60; plg_tmStruc->tm_hour = t%24; t/=24; // So, here t=days_from_1970_year plg_tmStruc->tm_year = tt = 1968-1900 + ((t+365+366)*4)/(365*4+1); // years // computation: days since 1 jan of nearest 4*year: t = t+365+366 - (unsigned long)((t+365+366)/(365*4+1)) * (365*4+1); if(t<366) { // visokosniy year t=t%366; if(t>=31+29) t++; // (december 29 days-->30 days) } else { t=(t-366)%365; if(t>=31+28) t++; // (december 29 days-->30 days) if(t>=31+28) t++; } if(t<=31+30+31+30+31+30+31) { tt = t%(31+30); // day of 2*month (0...30) t = t/(31+30); // 2*month of year t = t*2 + tt/31; // month of year tt = tt%31; // day of 2*month (0...30) } else { t--; tt = t%(31+30); // day of 2*month (0...30) t = t/(31+30); // 2*month of year t = t*2; if(tt>=30) { t++; // month of year tt-=30; // day of 2*month (0...30) } } plg_tmStruc->tm_mon = t; // t = month: 0...11 plg_tmStruc->tm_mday = tt+1; // tt = day: 0...30 --> 1...31 } time_t tm_to_time_t(struct tm *tmStruc) { // May be it's a my analog of: // LocalToEpoch(year,month,day,hour,minute,second); // (http://wood.lesobank.ru/docs/fpc/units/node340.html) // Note: to convert tm --> time_t available library functions: // tmStruc = gmtime(&timeOfDayGmt); // tmStruc = localtime(&timeOfDayGmt); // struct tm { // from <time.h> // int tm_sec; // seconds after the minute -- [0,61] // int tm_min; // minutes after the hour -- [0,59] // int tm_hour; // hours after midnight -- [0,23] // int tm_mday; // day of the month -- [1,31] // int tm_mon; // months since January -- [0,11] // int tm_year; // years since 1900 // int tm_wday; // days since Sunday -- [0,6] // int tm_yday; // days since January 1 -- [0,365] // int tm_isdst; // Daylight Savings Time flag */ // }; unsigned long i, date, month, year; date = tmStruc->tm_mday; // now: date=1...31 month = tmStruc->tm_mon; // now: month=0...11 year = 1900+tmStruc->tm_year; // now: year=0...9999 if(year<1970) return 0; year-=1970; i = year*366 + month*31 + (date-1); // subtracting number of full non-visokosnih years: i -= ((year+2)*3/4 - 1); // subtracting number of 30-day-months: if(month<7) i-=(month+0)/2; else i-=(month-1)/2; // subtracting 28-day-feb (if after): if(((year+2)%4) == 0) { // if visokosniy year: if(month>1) i-=1; } else { // if non-visokosniy year: if(month>1) i-=2; } i *= 24; i += tmStruc->tm_hour; i *= 60; i += tmStruc->tm_min; i *= 60; i += tmStruc->tm_sec; // WATCOM'S RINTIME after feb 2100: // // skip 1 day 01.03.2100 (in feb 2100) // WATCOM'S RINTIME after feb 2100: // if(i>=0xF4D57100) { // WATCOM'S RINTIME after feb 2100: // i-=60*60*24; // 0x15180 // WATCOM'S RINTIME after feb 2100: // } // WATCOM'S RINTIME after feb 2100: // if(i>=0xF72E9D00) { // WATCOM'S RINTIME after feb 2100: // i+=60*60*24; // WATCOM'S RINTIME after feb 2100: // } return i; } time_t getTzSecondsShift() { time_t timeOfDayGmt; time_t timeOfDayLocal; struct tm *tmStruc; timeOfDayGmt = 10000000; // time(NULL); tmStruc = localtime(&timeOfDayGmt); timeOfDayLocal = tm_to_time_t(tmStruc); return timeOfDayLocal-timeOfDayGmt; } void setFileDateTime(char *fileName, unsigned long time_t_setDateTime) { // LocalToEpoch(year,month,day,hour,minute,second); #ifdef WIN32 struct tm tmStruc; // struct tm { // from <time.h> // int tm_sec; // seconds after the minute -- [0,59] // int tm_min; // minutes after the hour -- [0,59] // int tm_hour; // hours after midnight -- [0,23] // int tm_mday; // day of the month -- [1,31] // int tm_mon; // months since January -- [0,11] // int tm_year; // years since 1900 // int tm_wday; // days since Sunday -- [0,6] // int tm_yday; // days since January 1 -- [0,365] // int tm_isdst; // Daylight Savings Time flag */ // }; // typedef struct _SYSTEMTIME { // st // WORD wYear; // WORD wMonth; // WORD wDayOfWeek; // WORD wDay; // WORD wHour; // WORD wMinute; // WORD wSecond; // WORD wMilliseconds; // } SYSTEMTIME; SYSTEMTIME sysTime; struct _FILETIME fCreationTime; // time the file was created struct _FILETIME fLastAccessTime; // time the file was last accessed struct _FILETIME fLastWriteTime; // time the file was last written (lastModify) // ЦАБ═╜═╒╚╗╒═╔╛ ╓═БЦ ╗ ╒Ю╔╛О А╝╖╓═╜╗О ╞╝╓ NT HANDLE hndl = CreateFile( fileName, // pointer to name of the file GENERIC_WRITE, // access (read-write) mode FILE_SHARE_READ, // share mode NULL, // pointer to security attributes OPEN_EXISTING, // how to create FILE_ATTRIBUTE_HIDDEN, // file attributes NULL // handle to file with attributes to copy ); //CoDosDateTimeToFileTime(ElemDate, ElemTime, &fCreationTime); //LocalFileTimeToFileTime(&fCreationTime, &fCreationTime); time_t_to_tm(time_t_setDateTime-tzSecondsShift, &tmStruc); sysTime.wYear = tmStruc.tm_year+1900; sysTime.wMonth = tmStruc.tm_mon+1; sysTime.wDay = tmStruc.tm_mday; sysTime.wHour = tmStruc.tm_hour; sysTime.wMinute = tmStruc.tm_min; sysTime.wSecond = tmStruc.tm_sec; sysTime.wMilliseconds = 0; sysTime.wDayOfWeek = 0; //GetSystemTime(&sysTime); if(SystemTimeToFileTime(&sysTime, &fCreationTime) == 0) { showError("Err SystemTimeToFileTime()"); alert("Err SystemTimeToFileTime()"); } fLastAccessTime = fCreationTime; fLastWriteTime = fCreationTime; SetFileTime(hndl, NULL/*&fCreationTime*/, NULL/*&fLastAccessTime*/, &fLastWriteTime); if(CloseHandle(hndl) == 0) { printf("Err CloseHandle\n"); alert("Err CloseHandle"); } #else // under unix: // utime - ЦАБ═╜╝╒╙═ ╒Ю╔╛╔╜╗ ╓╝АБЦ╞═ ╗ ╛╝╓╗Д╗╙═Ф╗╗ Д═╘╚═ // http://www.citforum.ru/operating_systems/manpages/UTIME.2.shtml // read file's attributes, Access time, Modification time: // http://www.citforum.ru/operating_systems/manpages/STAT.2.shtml struct utimbuf utimbufTimes; utimbufTimes.actime = time_t_setDateTime-tzSecondsShift; // Access time utimbufTimes.modtime = time_t_setDateTime-tzSecondsShift; // Modification time if(utime(fileName, &utimbufTimes)!=0) { Mac_logPr5(("setFileDateTime() under unix: can't utime('%s', time_t_setDateTime=0x%X) !!!\n", fileName, time_t_setDateTime)); return; } #endif } void advWriteToFile(char *fileName, void *writeBuf, unsigned long len, unsigned long time_t_setDateTime) { int j; FILE *fileHandle; char tmpFileName[MAXPATH_len]; int nameLen=0; while(fileName[nameLen]) nameLen++; if(nameLen>220) { printf("advWriteToFile(): nameLen=%d truncating to 220 chars fileName: '%s'\n", nameLen, fileName); nameLen=220; fileName[nameLen]=0; } sprintf(tmpFileName, "%s -tmp- .htm", fileName); // trying to create file: if((fileHandle = fopen(tmpFileName, "wb")) == NULL) { //printf("advWriteToFile(): Warning: can't create file: '%s' at first step\n", tmpFileName); // trying to create DIRs: for(j=0; tmpFileName[j]!=0; j++) { if(tmpFileName[j]==PATHDELIM || tmpFileName[j]=='/' || tmpFileName[j]=='\\') { tmpFileName[j]=0; //printf("advWriteToFile(): Making DIR: '%s'\n", tmpFileName); makeDir(tmpFileName); tmpFileName[j]=PATHDELIM; } } // trying to create file again: if((fileHandle = fopen(tmpFileName, "wb")) == NULL) { printf("advWriteToFile(): Warning: can't create file: '%s'!!!\n", tmpFileName); //waitPressEnter(); return; } } if(len==0) { Mac_logPr8(("advWriteToFile(): WARNING: trying to write 0 bytes to file '%s' - nothing written to file\n\n", fileName)); } else { if((fwrite(writeBuf, len, 1, fileHandle)) != 1) { printf("advWriteToFile(): Warning: can't write to file: '%s' %d bytes\n", fileName, len); //waitPressEnter(); } } fclose(fileHandle); unlink(fileName); if(time_t_setDateTime!=0 && time_t_setDateTime!=(unsigned long)-1) { setFileDateTime(tmpFileName, time_t_setDateTime); } rename(tmpFileName, fileName); } void advAppendToFile(char *fileName, void *writeBuf, char *writeBuf2, unsigned long len, unsigned long len2) { FILE *fileHandle; if(len==(unsigned long)-1) { // asuming writeBuf as string with 0 at end ==> count len: while(((char*)writeBuf)[++len] != 0) ; } if(len2==(unsigned long)-1) { // asuming writeBuf as string with 0 at end ==> count len: while(((char*)writeBuf2)[++len2] != 0) ; } // trying to open file for append write: if((fileHandle = fopen(fileName, "a+")) == NULL) { Mac_logPr9(("Can't open for append write file: '%s' - file not found ==> creating", fileName)); advWriteToFile(fileName, writeBuf, len, (unsigned long)-1); return; } if(len==0) { Mac_logPr9(("advAppendToFile(): WARNING: trying to write 0 bytes to file '%s' - nothing written to file\n\n", fileName)); } else { if((fwrite(writeBuf, len, 1, fileHandle)) != 1) { printf("advAppendToFile(): Warning: can't write to file: '%s' %d bytes\n", fileName, len); //waitPressEnter(); } } if(len2==0) { Mac_logPr9(("advAppendToFile(): WARNING: trying to write 0 bytes to file '%s' - nothing written to file\n\n", fileName)); } else { if((fwrite(writeBuf2, len2, 1, fileHandle)) != 1) { printf("advAppendToFile(): Warning: can't write to file: '%s' %d bytes\n", fileName, len2); //waitPressEnter(); } } fclose(fileHandle); } void showMsg(char *sz,...) { char *stringBuf; Mac_new(stringBuf, char, MAXPATH_len, "showMsg() 1"); sprintf(stringBuf, sz, *(ARRLONGS16*)(&sz+1)); printf("%s\n", stringBuf); advAppendToFile(fileLog, stringBuf, "\n", (unsigned long)-1, (unsigned long)-1); Mac_delete(stringBuf); } char *readFile(unsigned long *retLen, char *argsFileName,...) { char fileName[MAXPATH_len]; FILE *fileHandle; char *retContent; sprintf(fileName, argsFileName, *(ARRLONGS16*)(&argsFileName+1)); if((fileHandle = fopen(fileName, "rb")) == NULL) { Mac_logPr7(("Warning: can't open for read file: '%s'\n", fileName)); return NULL; } fseek(fileHandle, 0L, SEEK_END); *retLen = ftell(fileHandle); fseek(fileHandle, 0L, SEEK_SET); Mac_new(retContent, char, (*retLen+1), "readFile() 1"); retContent[*retLen]='\0'; if(*retLen!=0) { if((fread(retContent, *retLen, 1, fileHandle)) != 1) { showError("Warning: can't read from file: '%s' %d bytes", fileName, *retLen); Mac_delete(retContent); retContent = NULL; } } fclose(fileHandle); return retContent; } int dirOpen(char *szRoDirName) { int i = 0; char szDirName[MAXPATH_len]; while((szDirName[i]=szRoDirName[i]) != 0) i++; if(i!=0) szDirName[i++] = PATHDELIM; #ifdef HAVE_DIRENT szDirName[i ] = 0; if((dirReader_dir = opendir(szDirName)) == NULL) { szDirName[i-1] = 0; #else szDirName[i++] = '*'; szDirName[i ] = 0; dirReader_handle = (HANDLE)FindFirstFile(szDirName, (WIN32_FIND_DATA*)dirReader_WIN32_FIND_DATA); szDirName[i-2] = 0; //alert("!!!! sizeof(HANDLE)=%d szDirName=%s dirReader_handle=0x%X", sizeof(HANDLE), szDirName, dirReader_handle); if(dirReader_handle == INVALID_HANDLE_VALUE) { #endif lastRetFindNextFile = 0; return 1; // error } #define lastRetFindNextFileFirst 0xFFFFFFFF lastRetFindNextFile = lastRetFindNextFileFirst; return 0; // OK open DIR } char *dirRead() { #ifdef HAVE_DIRENT if((dirReader_rec = readdir((DIR*)dirReader_dir)) == NULL) return NULL; //Mac_logPr1(("############ Util::dirRead(): DT_=%d", (int)((struct dirent*)dirReader_rec)->d_type)); //if((((struct dirent*)dirReader_rec)->d_type&DT_LNK) == DT_LNK) { // Mac_logPr0(("Util::dirRead(): DT_LNK")); // return NULL; //} return ((struct dirent*)dirReader_rec)->d_name; #else if(lastRetFindNextFile == lastRetFindNextFileFirst) { lastRetFindNextFile = 1; } else { if((lastRetFindNextFile = FindNextFile((HANDLE)dirReader_handle, (WIN32_FIND_DATA*)dirReader_WIN32_FIND_DATA)) == 0) { return NULL; } } //alert("entryName=%s, lastRetFindNextFile=0x%X", ((WIN32_FIND_DATA*)dirReader_WIN32_FIND_DATA)->cFileName, lastRetFindNextFile); return ((WIN32_FIND_DATA*)dirReader_WIN32_FIND_DATA)->cFileName; #endif } void dirClose() { #ifdef HAVE_DIRENT if(closedir((DIR*)dirReader_dir) != 0) { #else if(FindClose((HANDLE)dirReader_handle) == 0) { #endif Mac_logPr0(("Util::dirClose(): Unable to close directory")); } } char *getRecursiveDirList(char *baseDir, unsigned long *ret_lenOfListInBytes, unsigned long *ret_numOfLinesInList) { *ret_numOfLinesInList=0; // to collect filenames // (Note: DIR_STACK_PORTION must be at least 2*MAXPATH_len): #define DIR_STACK_PORTION (512*1024) unsigned long dirStackLen = DIR_STACK_PORTION; unsigned long lstStackLen = DIR_STACK_PORTION; char *Mac_new(stackOfDirs, char, dirStackLen, "getDirList() 1"); char *Mac_new(dirFileList, char, lstStackLen, "getDirList() 2"); unsigned long dirFileListPtr=0; unsigned long stackLastStringBeg=0; unsigned long stackLastStringEnd=0; stackOfDirs[stackLastStringEnd++]=1; while((stackOfDirs[stackLastStringEnd]=baseDir[stackLastStringEnd-1]) != '\0') stackLastStringEnd++; stackLastStringEnd++; while(stackLastStringEnd!=0) { if(stackOfDirs[stackLastStringBeg]==2) { // if last entry already checked, added to queue - forget it and // search previous string in stackOfDirs[]: stackLastStringEnd=stackLastStringBeg; if(stackLastStringBeg==0) break; stackLastStringBeg--; if(stackOfDirs[stackLastStringBeg]!='\0') Mac_logPr0(("getDirList(): INTERNAL ERROR: unexpeted non-zero char")); if(stackLastStringBeg!=0) { while(--stackLastStringBeg!=0) { if(stackOfDirs[stackLastStringBeg]=='\0') break; } } if(stackOfDirs[stackLastStringBeg]=='\0') stackLastStringBeg++; continue; } stackOfDirs[stackLastStringBeg]=2; // set flag "added" unsigned int stackLastStringBegDir = stackLastStringBeg; if(dirOpen(&stackOfDirs[stackLastStringBeg+1]) != 0) { // if entry is file (not DIR): // So, it's file - adding to dirFileList: if(stackLastStringBeg==0) { Mac_logPr1(("getDirList(): Unable to open base directory: '%s'\n", &stackOfDirs[stackLastStringBeg+1])); } char *s = &stackOfDirs[stackLastStringBeg+1]; // skip baseDir: char *bd = baseDir; while(*s == *bd) { if(*s=='\0') break; s++; bd++; } if(*s == '/' || *s == '\\') s++; // copy filename (repace '\\' --> '/' instead PATH DELIMITERS): while((dirFileList[dirFileListPtr++] = *s++) != '\0') { if(dirFileList[dirFileListPtr-1]=='\\') { dirFileList[dirFileListPtr-1]='/'; } } dirFileList[dirFileListPtr-1] = '\n'; dirFileList[dirFileListPtr] = '\0'; (*ret_numOfLinesInList)++; // check for needs enlarge: if(dirFileListPtr > lstStackLen-MAXPATH_len) { // enlarging dirFileList: char *Mac_new(dirFileList_, char, lstStackLen+DIR_STACK_PORTION, "getDirList() 3"); memmove(dirFileList_, dirFileList, dirFileListPtr+1); Mac_delete(dirFileList); dirFileList=dirFileList_; lstStackLen+=DIR_STACK_PORTION; //alert("getDirList(): needs enlarge dirFileList"); //Mac_logPr1(("getDirList(): needs enlarge dirFileList")); } } else { if(stackLastStringBeg!=0) { char bufPrintf[MAXPATH_len]; sprintf(bufPrintf, "Reading DIR: '%s'\r", &stackOfDirs[stackLastStringBeg+1]); bufPrintf[75]='\0'; fprintf(stderr, "%s\r", bufPrintf); } char *entryName; while((entryName = dirRead()) != NULL) { if(entryName[0]=='.' && entryName[1]=='.' && entryName[2]==0) continue; if(entryName[0]=='.' && entryName[1]==0) continue; //alert("entryName='%s'", entryName); unsigned int stackLastStringEndBak=stackLastStringEnd; stackOfDirs[stackLastStringEnd++]=1; if(stackLastStringEnd > dirStackLen-MAXPATH_len) { // enlarging stackOfDirs: char *Mac_new(stackOfDirs_, char, dirStackLen+DIR_STACK_PORTION, "getDirList() 6"); memmove(stackOfDirs_, stackOfDirs, dirStackLen); Mac_delete(stackOfDirs); stackOfDirs=stackOfDirs_; dirStackLen+=DIR_STACK_PORTION; } sprintf(&stackOfDirs[stackLastStringEnd], "%s" StringPATHDELIM "%s", &stackOfDirs[stackLastStringBegDir+1], entryName); while(stackOfDirs[stackLastStringEnd++] != '\0') ; stackLastStringBeg=stackLastStringEndBak; } dirClose(); } } Mac_delete(stackOfDirs); *ret_lenOfListInBytes = dirFileListPtr; return dirFileList; } char **sortListGetArrayOfStringPointers(char *list, unsigned long numOfLines) { char **Mac_new(retSortedListOfLines, char*, numOfLines+1, "sortListGetArrayOfStringPointers() 1"); unsigned int i; unsigned int l=0; for(i=0; i < numOfLines; i++) { retSortedListOfLines[i] = &list[l]; // find end of line: while(list[l++] != '\n') ; } // SORT with some changed (adapted) ANSI-C qsort(): #define MAXSTACK 1024/*(sizeof(size_t) * CHAR_BIT)*/ char **loBoundStack[MAXSTACK]; char **upBoundStack[MAXSTACK]; unsigned int offset; int stackPtr; loBoundStack[0] = retSortedListOfLines; upBoundStack[0] = &retSortedListOfLines[numOfLines-1]; for(stackPtr=0; stackPtr>=0; stackPtr--) { char **loBound, **lo, **m, **pivot; char **upBound, **up, *varTmp; loBound = loBoundStack[stackPtr]; upBound = upBoundStack[stackPtr]; while(loBound < upBound) { // select pivot and exchange with 1st element (loBound[0]): offset = (upBound - loBound) >> 1; pivot = loBound + offset; varTmp=*loBound; *loBound=*pivot; *pivot=varTmp; // (exchange) // partition into two segments - presort and count middle: lo = loBound+1; up = upBound; for(;;) { while(lo<up) { // if(compar(loBound, lo) > 0) lo++;: // compareStrings(loBound->wordString, lo->wordString);: char *s1 = *loBound; char *s2 = *lo; if(*s1 == *s2) { while(*++s1 == *++s2) { if((unsigned char)*s1 == (unsigned char)'\n') break; } } if((unsigned char)*s1 < (unsigned char)*s2) break; lo++; } while(up>=lo) { //if(compar(up, loBound) > 0) up--;: // compareStrings(up->wordString, loBound->wordString);: char *s1 = *up; char *s2 = *loBound; if(*s1 == *s2) { while(*++s1 == *++s2) { if((unsigned char)*s1 == (unsigned char)'\n') break; } } if((unsigned char)*s1 < (unsigned char)*s2) break; up--; } if(lo>=up) break; varTmp=*lo; *lo=*up; *up=varTmp; // (exchange) up--; lo++; } // pivot belongs in A[j]: varTmp=*loBound; *loBound=*up; *up=varTmp; // (exchange) m = up; // keep processing smallest segment, and stack largest: if(m-loBound <= upBound-m) { if(m + 1 < upBound) { loBoundStack[stackPtr] = m + 1; upBoundStack[stackPtr++] = upBound; } upBound = m - 1; } else { if(m - 1 > loBound) { loBoundStack[stackPtr] = loBound; upBoundStack[stackPtr++] = m - 1; } loBound = m + 1; } } } return retSortedListOfLines; } char *listOfLinesToList(char **sortedListOfLines, unsigned long numOfLinesInList, unsigned long *lenOfListInBytes) { // count listLen: unsigned int i; unsigned long l=0; for(i=0; i < numOfLinesInList; i++) { unsigned int ii=0; while(sortedListOfLines[i][ii++] != '\n') l++; l++; } char *Mac_new(retSortedList, char, l+1, "listOfLinesToList() 1"); retSortedList[l] = '\0'; *lenOfListInBytes = l; // compiling list: l=0; for(i=0; i < numOfLinesInList; i++) { unsigned int ii=0; while((retSortedList[l++] = sortedListOfLines[i][ii++]) != '\n') ; } if(l != *lenOfListInBytes) { Mac_logPr0(("listOfLinesToList(): if(l != *lenOfListInBytes) {")); waitPressEnter(); exit(0); } return retSortedList; } unsigned long prepFileNameFromBaseDirAndRelativeName(char *put_fileName, char *baseDir, char *relativeName) { unsigned long i=0; // copy baseDir: while((put_fileName[i++] = *baseDir++) != 0) ; i--; if(i>1) if(put_fileName[i-1]=='/' || put_fileName[i-1]=='\\') { i--; } put_fileName[i++] = PATHDELIM; while((put_fileName[i++] = *relativeName++) != 0) ; // replace '/' and '\\' to PATHDELIM: for(i=0; put_fileName[i]!='\0'; i++) { if(put_fileName[i]=='/' || put_fileName[i]=='\\') { put_fileName[i]=PATHDELIM; } } return i; } void advMoveFile( char *inpDir, char *inpFileName, char *outDir, char *outFileName) { char inpName[MAXPATH_len]; char outName[MAXPATH_len]; prepFileNameFromBaseDirAndRelativeName(inpName, inpDir, inpFileName); prepFileNameFromBaseDirAndRelativeName(outName, outDir, outFileName); int i = rename(inpName, outName); while(i!=0) { // if ERROR returned by rename() // make recursive dir: advWriteToFile(outName, "writeBuf[]", 10, (unsigned long)(-1)); unlink(outName); // trying rename after try to create directory structure: i = rename(inpName, outName); if(i!=0) { // if ERROR returned by rename() Mac_logPr1(("advDeleteFile(): Can not move '%s' to '%s'\n You may try to solve problem and press ENTER when ready (or press CTRL-C to break)", inpName, outName)); waitPressEnter(); } } } void preprocessFile(char *convCmd, char *srcName, char *destName) { // preparing command line: char commandLine[MAXPATH_len]; sprintf(commandLine, "\n_____________%s\n", srcName); advAppendToFile("preProcLog.txt", commandLine, "", (unsigned long)-1, (unsigned long)-1); sprintf(commandLine, "%s \"%s\" \"%s\" >>preProcLog.txt", convCmd, srcName, destName); system(commandLine); } UCRC getCRC32OfFile(char *baseDir, char *fileName_) { char fileName[MAXPATH_len]; unsigned long i; i = prepFileNameFromBaseDirAndRelativeName(fileName, baseDir, fileName_); // print current filename: if(i>75) { fprintf(stderr, "...%s\r", &fileName[i-75]); } else { sprintf(&fileName[i], " "); fileName[75]='\0'; fprintf(stderr, " %s\r", fileName); fileName[i]='\0'; } // preprocess file: char bakName[MAXPATH_len]; prepFileNameFromBaseDirAndRelativeName(bakName, bckpDir, fileName_); if(convCmd!=NULL) { advMoveFile(baseDir, fileName_, bckpDir, fileName_); preprocessFile(convCmd, bakName, fileName); } // read file: FILE *fileHandle; if((fileHandle = fopen(fileName, "rb")) == NULL) { //Mac_logPr1(("getCRC32OfFile(): can't open for read file: '%s'\n", fileName)); // restore file: if(convCmd != NULL) { advMoveFile(bckpDir, fileName_, baseDir, fileName_); // try open file afer restore from backup: if((fileHandle = fopen(fileName, "rb")) == NULL) { Mac_logPr1(("getCRC32OfFile():\n can not open for read file: '%s'\n this file was backuped to: '%s'\n", fileName, bakName)); waitPressEnter(); //exit(0); } } else { Mac_logPr1(("getCRC32OfFile():\n can not open for read file: '%s'\n", fileName)); waitPressEnter(); //exit(0); } } char *fileContent=NULL; unsigned long fileLen = 0; if(fileHandle!=NULL) { fseek(fileHandle, 0L, SEEK_END); fileLen = ftell(fileHandle); if(fileLen>partCrc) fileLen=partCrc; fseek(fileHandle, 0L, SEEK_SET); Mac_new(fileContent, char, fileLen+1, "getCRC32OfFile() 1"); if(fileLen != 0) { if((fread(fileContent, fileLen, 1, fileHandle)) != 1) { Mac_logPr0(("getCRC32OfFile(): error read file: '%s' from 0 to 0x%X", fileName, fileLen)); waitPressEnter(); exit(0); } } fclose(fileHandle); } // compute CRC32 of file content: UCRC crc = CRC_MASK; for(i=0; i<fileLen; i++) UPDATE_CRC(crc, fileContent[i]); crc = crc ^ 0xffffffffL; // free memory: if(fileContent!=NULL) Mac_delete(fileContent); // delete backuped file: if(bckpDel) { if(convCmd!=NULL) unlink(bakName); } return crc; } UCRC *getArrayCRC32sOfFiles(char *baseDir, char **list, unsigned long numOfLines) { UCRC *Mac_new(retCRC32sOfFiles, unsigned long, numOfLines+1, "getArrayCRC32sOfFiles() 1"); unsigned int i; for(i=0; i < numOfLines; i++) { unsigned int ii=0; // find end of line: while(list[i][ii] != '\n') ii++; list[i][ii] = '\0'; retCRC32sOfFiles[i] = getCRC32OfFile(baseDir, list[i]); // restore char '\n' after fileName: //list[i][ii] = '\n'; } return retCRC32sOfFiles; } void makeCrcTable() { unsigned int i, j; UCRC r; for(i=0; i<=255; i++) { r=i; for(j=8; j>0; j--) { // 1110 1101 1001 1000 1000 0011 0010 0000 : #define CRCPOLY 0xEDB88320UL if(r&1) r = (r>>1) ^ CRCPOLY; else r >>= 1; } crcTable[i] = r; //printf("%08lx ", crcTable[i]); //if(i%8 == 7) printf("\n"); } } void sortCRC32sAndListOfLines(UCRC *arrCRC32s, char **sortedListOfLines, unsigned long numOfLinesInList) { // SORT with some changed (adapted) ANSI-C qsort(): #define MAXSTACK 1024/*(sizeof(size_t) * CHAR_BIT)*/ UCRC *loBoundStack[MAXSTACK]; UCRC *upBoundStack[MAXSTACK]; unsigned int offset; int stackPtr; loBoundStack[0] = arrCRC32s; upBoundStack[0] = &arrCRC32s[numOfLinesInList-1]; for(stackPtr=0; stackPtr>=0; stackPtr--) { UCRC *loBound, *lo, *m, *pivot; UCRC *upBound, *up, varTmp; loBound = loBoundStack[stackPtr]; upBound = upBoundStack[stackPtr]; while(loBound < upBound) { // select pivot and exchange with 1st element (loBound[0]): offset = (upBound - loBound) >> 1; pivot = loBound + offset; varTmp=*loBound; *loBound=*pivot; *pivot=varTmp; // (exchange) char *chrTmp=sortedListOfLines[loBound-arrCRC32s]; sortedListOfLines[loBound-arrCRC32s]=sortedListOfLines[pivot-arrCRC32s]; sortedListOfLines[pivot-arrCRC32s]=chrTmp; // (exchange) // partition into two segments - presort and count middle: lo = loBound+1; up = upBound; for(;;) { while(lo<up) { // if(compar(loBound, lo) > 0) lo++;: // compareStrings(loBound->wordString, lo->wordString);: if(*loBound < *lo) break; if(*loBound == *lo) if(sortedListOfLines[loBound-arrCRC32s] < sortedListOfLines[lo-arrCRC32s]) break; lo++; } while(up>=lo) { //if(compar(up, loBound) > 0) up--;: // compareStrings(up->wordString, loBound->wordString);: if(*up < *loBound) break; if(*up == *loBound) if(sortedListOfLines[up-arrCRC32s] < sortedListOfLines[loBound-arrCRC32s]) break; up--; } if(lo>=up) break; varTmp=*lo; *lo=*up; *up=varTmp; // (exchange) chrTmp=sortedListOfLines[lo-arrCRC32s]; sortedListOfLines[lo-arrCRC32s]=sortedListOfLines[up-arrCRC32s]; sortedListOfLines[up-arrCRC32s]=chrTmp; // (exchange) up--; lo++; } // pivot belongs in A[j]: varTmp=*loBound; *loBound=*up; *up=varTmp; // (exchange) chrTmp=sortedListOfLines[loBound-arrCRC32s]; sortedListOfLines[loBound-arrCRC32s]=sortedListOfLines[up-arrCRC32s]; sortedListOfLines[up-arrCRC32s]=chrTmp; // (exchange) m = up; // keep processing smallest segment, and stack largest: if(m-loBound <= upBound-m) { if(m + 1 < upBound) { loBoundStack[stackPtr] = m + 1; upBoundStack[stackPtr++] = upBound; } upBound = m - 1; } else { if(m - 1 > loBound) { loBoundStack[stackPtr] = loBound; upBoundStack[stackPtr++] = m - 1; } loBound = m + 1; } } } } int compareTwoFiles(char *baseDir, char *fileName1_, char *fileName2_) { // return 0 if equal files // return 1 if 1-st file longer than 2-nd file // return 2 if 2-nd file longer than 1-st file // return 3 if differ char fileName1[MAXPATH_len]; char fileName2[MAXPATH_len]; prepFileNameFromBaseDirAndRelativeName(fileName1, baseDir, fileName1_); prepFileNameFromBaseDirAndRelativeName(fileName2, baseDir, fileName2_); FILE *fileHandle1; FILE *fileHandle2; if((fileHandle1 = fopen(fileName1, "rb")) == NULL) { printf("compareTwoFiles(): can't open for read file: '%s'\n", fileName1); waitPressEnter(); //exit(80); return 3; } if((fileHandle2 = fopen(fileName2, "rb")) == NULL) { printf("compareTwoFiles(): can't open for read file: '%s'\n", fileName2); waitPressEnter(); //exit(81); fclose(fileHandle1); return 3; } fseek(fileHandle1, 0L, SEEK_END); fseek(fileHandle2, 0L, SEEK_END); unsigned long fileLen1 = ftell(fileHandle1); unsigned long fileLen2 = ftell(fileHandle2); fseek(fileHandle1, 0L, SEEK_SET); fseek(fileHandle2, 0L, SEEK_SET); #define BUF_SIZE_FOR_COMPARE (4096*1024) unsigned long compareLen = fileLen1; if(fileLen2<compareLen) compareLen=fileLen2; unsigned long bufSize = BUF_SIZE_FOR_COMPARE; if(compareLen<bufSize) bufSize=compareLen; char *Mac_new(buf1, char, bufSize+1, "compareTwoFiles() 1"); char *Mac_new(buf2, char, bufSize+1, "compareTwoFiles() 2"); unsigned long restBytes = compareLen; while(restBytes!=0) { unsigned long readPortion=bufSize; if(restBytes<readPortion) readPortion=restBytes; // read to buf1[], buf2[] from files: if((fread(buf1, readPortion, 1, fileHandle1)) != 1) { printf("compareTwoFiles(): can't read from file: '%s' %d bytes", fileName1, readPortion); waitPressEnter(); exit(82); } if((fread(buf2, readPortion, 1, fileHandle2)) != 1) { printf("compareTwoFiles(): can't read from file: '%s' %d bytes", fileName2, readPortion); waitPressEnter(); exit(83); } // compare: unsigned int i; for(i=0; i<readPortion; i++) { if(buf1[i] != buf2[i]) { // if differences: fclose(fileHandle1); fclose(fileHandle2); Mac_delete(buf1); Mac_delete(buf2); return 3; } } restBytes-=readPortion; } // So, no differences encountered in compared part: fclose(fileHandle1); fclose(fileHandle2); Mac_delete(buf1); Mac_delete(buf2); if(fileLen1>fileLen2) return 1; // if no diffs, fileSize1>fileSize2 if(fileLen2>fileLen1) return 2; // if no diffs, fileSize2>fileSize1 return 0; // if no diffs, equal sizes (equal file-contents) } void checkAndProcessDupFiles(char *baseDir, char **fileList, unsigned long numOfFilesToCheck) { char *Mac_new(isDeletedFlags, char, numOfFilesToCheck+1, "checkAndProcessDupFiles() 1"); memset(isDeletedFlags, 0, numOfFilesToCheck+1); Mac_logPr5(("_____________________________________________________________________________\ncheckAndProcessDupFiles(): processing files:")); unsigned int i; for(i=0; i < numOfFilesToCheck; i++) { Mac_logPr5((" %s", fileList[i])); } // checking each file[i] from end to beg: for(i=numOfFilesToCheck-1; i!=((unsigned int)(-1)); i--) { if(isDeletedFlags[i]!=0) continue; // if file[i] was deleted // if file[i] equals to one of files[j] (j=[0]...[i-1]) ==> // ==> delete file[i]: unsigned int j; for(j=0; j<i; j++) { if(isDeletedFlags[j]!=0) continue; // if file[j] was deleted char reportString[MAXPATH_len]; // So, files [j] and [i] not deleted ==> comparing: int cmpResult=compareTwoFiles(baseDir, fileList[j], fileList[i]); if(cmpResult==0) { // if equal files: Mac_logPr5(("Equal files: \"%s\" equals to \"%s\" ==> removing 2-nd file: \"%s\"", fileList[j], fileList[i], fileList[i])); sprintf(reportString, "Eq :\t\"%s\"\t\"%s\"", fileList[j], fileList[i]); advAppendToFile(rprtTxt, reportString, "\n", (unsigned long)-1, (unsigned long)-1); if(dupsDel) advMoveFile(baseDir, fileList[i], tempDir, fileList[i]); numOfDupFiles++; isDeletedFlags[i]=1; break; } else if(cmpResult==1) { // if 1-st file longer than 2-nd file: Mac_logPr5(("1-st file longer than 2-nd file (but common part equal): \"%s\" longer than \"%s\" ==> removing 2-nd file: \"%s\"", fileList[j], fileList[i], fileList[i])); sprintf(reportString, "Eq1:\t\"%s\"\t\"%s\"", fileList[j], fileList[i]); advAppendToFile(rprtTxt, reportString, "\n", (unsigned long)-1, (unsigned long)-1); if(dupsDel) advMoveFile(baseDir, fileList[i], tempDir, fileList[i]); numOfDupFiles++; isDeletedFlags[i]=1; break; } else if(cmpResult==2) { // if 2-nd file longer than 1-st file: Mac_logPr5(("2-nd file longer than 1-st file (but common part equal): \"%s\" smaller than \"%s\" ==> removing 1-st file: \"%s\"", fileList[j], fileList[i], fileList[j])); sprintf(reportString, "Eq2:\t\"%s\"\t\"%s\"", fileList[j], fileList[i]); advAppendToFile(rprtTxt, reportString, "\n", (unsigned long)-1, (unsigned long)-1); if(dupsDel) advMoveFile(baseDir, fileList[j], tempDir, fileList[j]); numOfDupFiles++; isDeletedFlags[j]=1; } else { Mac_logPr5(("Differ files (but equal CRC32 in part 0...%d): \"%s\" differ from \"%s\" ==> do not removing anything", partCrc, fileList[j], fileList[i])); sprintf(reportString, "Eq-:\t\"%s\"\t\"%s\"", fileList[j], fileList[i]); advAppendToFile(rprtTxt, reportString, "\n", (unsigned long)-1, (unsigned long)-1); } } } advAppendToFile(rprtTxt, "", "\n", (unsigned long)-1, (unsigned long)-1); Mac_delete(isDeletedFlags); } int main(int argc, char *argv[]) { tzSecondsShift = getTzSecondsShift(); makeCrcTable(); char *baseDir = NULL; // read command line options: if(argc<2) usage(); optionVerbose |= getArg(NULL, argv, "-v", "+"); optionVerbose |= getArg(NULL, argv, "-V", "+"); if(getArg(NULL, argv, "-?", "+")) usage(); if(getArg(NULL, argv, "-h", "+")) usage(); if(getArg(NULL, argv, "-H", "+")) usage(); getArg(&fileLog, argv, "-fileLog", "fileLog.txt"); getArg(&convLog, argv, "-convLog", "convLog.txt"); getArg(&rprtTxt, argv, "-rprtTxt", "report.txt"); partCrc = getArg(NULL, argv, "-partCrc", "4096"); getArg(&baseDir, argv, "-baseDir", "."); getArg(&bckpDir, argv, "-bckpDir", "#backup$"); getArg(&tempDir, argv, "-tempDir", "TEMP"); getArg(&convCmd, argv, "-convCmd", ""); if(convCmd[0]=='\0') convCmd=NULL; getArg(&saveLst, argv, "-saveLst", ""); if(saveLst[0]=='\0') saveLst=NULL; bckpDel = getArg(NULL, argv, "-bckpDel", "+"); dupsDel = getArg(NULL, argv, "-dupsDel", "+"); unsigned int i; #ifndef HAVE_DIRENT Mac_new(dirReader_WIN32_FIND_DATA, WIN32_FIND_DATA, 1, "Util::Util() 5"); #endif unsigned long lenOfListInBytes=0; unsigned long numOfLinesInList=0; Mac_logPr5(("Scanning directory tree of dir '%s'", baseDir)); char *fileList = getRecursiveDirList(baseDir, &lenOfListInBytes, &numOfLinesInList); //writeToFile("unsortedfileList", fileList, lenOfListInBytes); fprintf(stderr, "\r \r"); Mac_logPr5(("Sorting list of files...")); char **sortedListOfLines = sortListGetArrayOfStringPointers(fileList, numOfLinesInList); char *sortedFileList = listOfLinesToList(sortedListOfLines, numOfLinesInList, &lenOfListInBytes); // get optimized list: char **finalListOfLines = sortListGetArrayOfStringPointers(sortedFileList, numOfLinesInList); // checking list (must be sequentially): if(numOfLinesInList != 0) { for(i=0; i < numOfLinesInList-1; i++) { if(finalListOfLines[i] >= finalListOfLines[i+1]) { Mac_logPr0(("main(): internal error: bad sort of fileList\n")); waitPressEnter(); exit(0); } } } Mac_delete(fileList); Mac_delete(sortedListOfLines); sortedListOfLines = finalListOfLines; if(saveLst!=NULL) { Mac_logPr5(("Saving list of files to file: '%s'", saveLst)); writeToFile(saveLst, sortedFileList, lenOfListInBytes); } Mac_logPr5(("%sGet hashes (CRC32 of beg %d Bytes) of each file in list", convCmd!=NULL?"Convert files, ":"", partCrc)); // get hashes (CRC32) of all files in list: UCRC *arrCRC32s = getArrayCRC32sOfFiles(baseDir, finalListOfLines, numOfLinesInList); //Mac_logPr5(("Saving CRC table to file: 'arrCRC32s' ")); //writeToFile("arrCRC32s", arrCRC32s, numOfLinesInList*sizeof(UCRC)); Mac_logPr5(("Sorting CRC32 values of all files")); sortCRC32sAndListOfLines(arrCRC32s, sortedListOfLines, numOfLinesInList); //Mac_logPr5(("Saving sorted CRC table to file: 'arrCRC32s' ")); //writeToFile("arrCRC32sSorted", arrCRC32s, numOfLinesInList*sizeof(UCRC)); Mac_logPr5(("Find equal hashes, check%s, report about duplicate files", dupsDel?", move":"")); if(numOfLinesInList > 1) { for(i=0; i < numOfLinesInList-1; i++) { if(arrCRC32s[i] == arrCRC32s[i+1]) { unsigned int ii; for(ii=i; ii < numOfLinesInList; ii++) { if(arrCRC32s[ii]!=arrCRC32s[i]) break; } checkAndProcessDupFiles(baseDir, &finalListOfLines[i], ii-i); i = ii-1; } } } Mac_logPr5(("\n\nOK: Completed: %sFind%s Duplicate Files: numOfDupFiles=%d\n", convCmd!=NULL?"Convert files, ":"", dupsDel!=0?", Remove":"", numOfDupFiles)); Mac_delete(arrCRC32s); Mac_delete(sortedFileList); Mac_delete(sortedListOfLines); #ifndef HAVE_DIRENT Mac_delete(dirReader_WIN32_FIND_DATA); #endif waitPressEnter(); return 0; }
Created:
october 17, 2000,
http://www.chat.ru/~vitaliy_vasiliev/
http://free.prohosting.com/~vitivas/